SQL Validator

SQL Validator

As an end Looker user, it’s really frustrating and confusing to encounter a database error like this one:

❗ The Google BigQuery Standard SQL database encountered an error while running this query..

Failed to retrieve data - Name accepts_marketing not found inside customers at [3:29]

SELECT
	customers.accepts_marketing AS customers_accepts_marketing
FROM analytics.customers AS customers
GROUP BY 1
ORDER BY 1
LIMIT 500

However, these errors are tricky to catch in advance, because the LookML validator only checks for valid LookML. It doesn’t query the database to see if the SQL you’ve provided is valid.

The SQL validator tests that the SQL queries that Looker generates will run successfully for your users. To catch these database errors before they happen, the SQL validator takes the following steps:

  1. Connects to the API and establishes a development mode session
  2. Checks out the specified branch of interest on the project
  3. Parses the LookML for each Explore in the project to find exposed dimensions
  4. Queries each Explore using the desired query mode to find any dimensions with SQL errors
  5. Reports any errors

Query modes

The SQL validator can be run in three modes: batch mode (default), single mode, and hybrid mode. You can set the query mode at the command line with the --mode option.

In batch mode (default), the SQL validator will create one query for each explore. Each explore query will include each dimension in that explore. The benefit of this mode is that the SQL validator will run far fewer queries, which usually completes more quickly (depending on the complexity of your explores). The downside to this mode is that you will only be notified of the first error on each explore, instead of every error.

In single mode, the SQL validator will run one query per dimension and individually report back on each errored dimension. The benefit of this mode is that you will catch every error you need to fix. The downside is that it can run a large number of queries which may take a while to complete.

Be careful when running SQL validation in single mode. Depending on the size of your Looker instance, this could create and execute thousands of queries, clogging your data warehouse queue or incurring costs (depending on your data warehouse). As a precaution, Spectacles runs no more than 10 queries at a time. You may want to use hybrid mode (which limits single-dimension queries to errored explores).

When running SQL validation in single-dimension mode, it’s usually best to restrict validation to a specific explore to reduce the query load on your instance. You can also decrease query concurrency to lower the number of queries Spectacles runs simultaneously against your data warehouse.

Additionally, if you cancel Spectacles with ctrl+C, Spectacles will attempt to terminate all the queries it is currently running.

In hybrid mode, the SQL validator runs first in batch mode to identify the explores with errors, then in single-dimension mode to find each errored dimension in those explores. Hybrid mode takes more time than batch mode, but returns all the errored dimensions with links to the corresponding LookML.

Changing query concurrency

By default, Spectacles will not run more than 10 simultaneous queries on your warehouse. This cap exists to prevent Spectacles from clogging up your warehouse and Looker instance with hundreds of queued or running queries.

You can change the concurrency setting based on your needs (increase it for faster validation, decrease it for a lower impact on warehouse and Looker load). To change concurrency, provide the --concurrency argument via the command line (or specify it in your YAML config file).

Running the SQL validator

To run SQL validation, you must specify the LookML project and Git branch to test in addition to your API credentials. As with your API credentials, these can be set via the command line (--project and --branch), environment variables (LOOKER_PROJECT and LOOKER_GIT_BRANCH), or a YAML file.

Once the desired project and branch have been set, run SQL validation as follows (we’re explicitly specifying batch mode here, but it is the default if --mode is not provided):

spectacles sql \
	--base-url https://company_name.looker.com \
	--client-id HpB53crD3DMDkdJPCRc \
	--client-secret jDRSM6ZGG9D5WH3r5FynQ9c \
	--project thelook \
	--branch dev-john-doe-294a \
	--mode batch \
	--concurrency 5
# First, export your API credentials as environment variables
# e.g. `export LOOKER_CLIENT_ID=djkskcSAsk2idxjsidgf9`
spectacles sql --mode batch
spectacles sql \
	--config-file looker_configuration.yml \
	--mode batch

Here’s a sample report from a SQL validation run. Successes and failures are reported by Explore with the corresponding error from each explore reported beneath. Since this validation was run in batch mode, we’ll only see the first error identified in each explore.

============== Testing 2 explores [batch mode] [concurrency = 10] ==============

✓ eye_exam.users passed
✗ eye_exam.users__fail failed

Completed SQL validation in 3 seconds.

============================= eye_exam/users__fail =============================

The Snowflake database encountered an error while running this query. SQL Syntax
Error: SQL compilation error: error line 3 at position 1 invalid identifier
'USERS__FAIL.EMAIL_ADDRESS'

Test SQL: logs/queries/eye_exam__users__fail.sql

Ignoring dimensions

You may want to exclude certain dimensions from validation. For example, imagine you have a dimension that’s dependent on a parameter. If that parameter doesn’t have a default value, the parameter’s value will be '' during validation and will break the downstream dimension.

To exclude the downstream dimension from validation, edit the sql field of your dimension LookML to include the comment -- spectacles: ignore.

dimension: addresses {
	sql:
		-- spectacles: ignore
		${TABLE}.addresses ;;
}

Specifying a specific Model or Explore

By default, the SQL validator will validate all models and explores in your instance. To restrict validation to a specific model or explore, use the --explores and --exclude flags.

These arguments accepts strings with the syntax model_name/explore_name. You can provide the wildcard * to select all explores in a given model, for example, here’s how you would select all explores in the customers model for validation:

spectacles sql --explores customers/*

If you wanted to select all explores in your project, but exclude the users explore in the customers model, you would do the following:

spectacles sql --exclude customers/users

The --explores and --exclude flags can be used in conjunction. Spectacles will first find all explores selected with the --explores flag and then remove any selected from the --exclude flag.

Exit Codes

If every Explore passes, the SQL validator will exit with code 0. If any Explore fails, it will exit with code 102.

👉 Next: Assert Validator