Examples

data_check sample project

This Git repository is also a sample data_check project. Clone the repository, switch to the folder and run data_check:

git clone git@github.com:andrjas/data_check.git
cd data_check/example
data_check

This will run the tests in the checks folder using the default connection as defined in data_check.yml.

The result will tell you which tests passed and which failed:

checks/generated/generate_before_running.sql: NO EXPECTED RESULTS FILE
checks/failing/invalid.sql: FAILED (with exception in checks/failing/invalid.sql)
checks/failing/expected_to_fail.sql: FAILED
checks/basic/simple_string.sql: PASSED
checks/basic/data_types.sql: PASSED
checks/basic/float.sql: PASSED
checks/basic/unicode_string.sql: PASSED

Tests structure

You can structure your test in many ways. You can also mix these structures.

By pipeline

You can structure your tests to run before/after some data pipeline has run:

checks/
    pipeline1/
        pre/
            test1.sql
            test1.csv
            ...
        post/
            ...
    pipeline2/
        pre/
            ...
        post/
            ...

By test execution time

In a CI environment you can structure your tests after the expected execution time of the tests.

checks/
    quick_tests/
        ...
    medium_tests/
        ...
    slow_running_tests/
        ...

This way you can run quick test, for example schema validation, many times during development. Other tests that must process a lot of data can be run less frequently, for example in an integration environment.