Skip to content

Getting Started

Installation

pip install vowl

Optional extras are available:

Extra What it adds
vowl[spark] PySpark support
vowl[all] Everything (Spark + AWS)

For local development, testing, and release workflow, see CONTRIBUTING.md.

Validate in 3 Lines

import pandas as pd  # or any Narwhals-compatible DataFrame
from vowl import validate_data

df = pd.read_csv("data.csv")
result = validate_data("contract.yaml", df=df)
result.display_full_report()
Sample Output (click to expand)
=== Data Quality Validation Results ===
   Contract Version:      v3.1.0
   Contract ID:           c11443ee-542f-4442-b28d-2d224342be37
   Schemas:               hdb_resale_prices

 OVERALL DATA QUALITY
   Overall:
     Checks Pass Rate:       18 / 20 (90.0%)

   hdb_resale_prices:
     Overall:
       Checks Pass Rate:       18 / 20 (90.0%)
       ERRORED Checks:         0
     Single Table:
       Checks Pass Rate:       18 / 20 (90.0%)
       ERRORED Checks:         0
       Unique Passed Rows:     195 / 200 (97.5%)
     Multi Table:
       Checks Pass Rate:       0 / 0 (N/A)
       ERRORED Checks:         0
       Non-unique Failed Rows: 0


 CHECK RESULTS
+-----------------------------------------+---------------------------------------+-------------------+--------+---------------+---------------+--------+----------------+
| check_id                                | Target                                | tables_in_query   | status | operator      | expected      | actual | execution time |
+-----------------------------------------+---------------------------------------+-------------------+--------+---------------+---------------+--------+----------------+
| Month                                   | hdb_resale_prices.month               | hdb_resale_prices | FAILED | mustBe        | 0             | 2      | 13.01 ms       |
| Year                                    | hdb_resale_prices.lease_commence_date | hdb_resale_prices | FAILED | mustBe        | 0             | 3      | 9.86 ms        |
+-----------------------------------------+---------------------------------------+-------------------+--------+---------------+---------------+--------+----------------+
| AddressBlockHouseNumber                 | hdb_resale_prices.block               | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 11.21 ms       |
| block_column_exists_check               | hdb_resale_prices.block               | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 2.34 ms        |
| flat_model_column_exists_check          | hdb_resale_prices.flat_model          | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 2.31 ms        |
| flat_type_column_exists_check           | hdb_resale_prices.flat_type           | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 2.23 ms        |
| flat_type_invalidValues                 | hdb_resale_prices.flat_type           | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 11.33 ms       |
| floor_area_must_be_less_than_200        | hdb_resale_prices.floor_area_sqm      | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 9.74 ms        |
| floor_area_sqm_column_exists_check      | hdb_resale_prices.floor_area_sqm      | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 2.14 ms        |
| hdb_resale_prices_rowCount              | hdb_resale_prices                     | hdb_resale_prices | PASSED | mustBeBetween | [0, 30000000] | 200    | 4.78 ms        |
| lease_commence_date_column_exists_check | hdb_resale_prices.lease_commence_date | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 2.45 ms        |
| month_column_exists_check               | hdb_resale_prices.month               | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 3.79 ms        |
| month_logical_type_check                | hdb_resale_prices.month               | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 2.77 ms        |
| remaining_lease_column_exists_check     | hdb_resale_prices.remaining_lease     | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 2.28 ms        |
| resale_price_column_exists_check        | hdb_resale_prices.resale_price        | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 2.28 ms        |
| resale_price_must_not_exceed_2m         | hdb_resale_prices.resale_price        | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 9.91 ms        |
| storey_range_column_exists_check        | hdb_resale_prices.storey_range        | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 2.24 ms        |
| street_name_column_exists_check         | hdb_resale_prices.street_name         | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 3.72 ms        |
| town_column_exists_check                | hdb_resale_prices.town                | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 3.53 ms        |
| town_nullValues                         | hdb_resale_prices.town                | hdb_resale_prices | PASSED | mustBe        | 0             | 0      | 9.04 ms        |
+-----------------------------------------+---------------------------------------+-------------------+--------+---------------+---------------+--------+----------------+
Total Execution:       110.94 ms

=== Failed Checks and Rows (up to 5 row(s) per failed check) ===

  hdb_resale_prices
    Single checks

      [Month]
        Operator:   mustBe
        Expected:   0
        Actual:     2
        Target:   hdb_resale_prices.month
        Details:  Based on ISO 8601, assumed to be in UTC +8 | YYYY-MM
        Rule:     SELECT COUNT(*) FROM `hdb_resale_prices` WHERE NOT ((CAST(month AS CHAR) RLIKE '^[0-9]{4}-(0[1-9]|1[0-2])$'))
        Rows shown: 2 of 2
+----------+--------+-----------+-------+--------------+--------------+----------------+---------------+---------------------+--------------------+--------------+
| month    | town   | flat_type | block | street_name  | storey_range | floor_area_sqm | flat_model    | lease_commence_date | remaining_lease    | resale_price |
+----------+--------+-----------+-------+--------------+--------------+----------------+---------------+---------------------+--------------------+--------------+
| 2017-jan | BEDOK  | 5 ROOM    | 21    | CHAI CHEE RD | 07 TO 09     | 130.0          | Adjoined flat | 1972                | 54 years 06 months | 530000.0     |
| 2017-jan | BISHAN | 3 ROOM    | 105   | BISHAN ST 12 | 04 TO 06     | 4.0            | Simplified    | 1985                | 67 years 11 months | 395000.0     |
+----------+--------+-----------+-------+--------------+--------------+----------------+---------------+---------------------+--------------------+--------------+

      [Year]
        Operator:   mustBe
        Expected:   0
        Actual:     3
        Target:   hdb_resale_prices.lease_commence_date
        Details:  Based on ISO 8601, assumed to be in UTC +8 | YYYY
        Rule:     SELECT COUNT(*) FROM `hdb_resale_prices` WHERE NOT ((CAST(lease_commence_date AS CHAR) RLIKE '^[0-9]{4}$'))
        Rows shown: 3 of 3
+---------+------------+-----------+-------+------------------+--------------+----------------+----------------+---------------------+--------------------+--------------+
| month   | town       | flat_type | block | street_name      | storey_range | floor_area_sqm | flat_model     | lease_commence_date | remaining_lease    | resale_price |
+---------+------------+-----------+-------+------------------+--------------+----------------+----------------+---------------------+--------------------+--------------+
| 2017-01 | ANG MO KIO | 3 ROOM    | 219   | ANG MO KIO AVE 1 | 07 TO 09     | 67.0           | New Generation | 1977.0              | 59 years 06 months | 297000.0     |
| 2017-01 | ANG MO KIO | 3 ROOM    | 211   | ANG MO KIO AVE 3 | 01 TO 03     | 67.0           | New Generation | abc                 | 59 years 03 months | 325000.0     |
| 2017-01 | ANG MO KIO | 3 ROOM    | 330   | ANG MO KIO AVE 1 | 07 TO 09     | 68.0           | New Generation | nan                 | 63 years           | 338000.0     |
+---------+------------+-----------+-------+------------------+--------------+----------------+----------------+---------------------+--------------------+--------------+

The ValidationResult Object

The validate_data function returns a powerful ValidationResult object that provides multiple ways to interact with your validation results.

Core Methods

Method/Property What It Does Returns
print_summary() Prints high-level statistics (pass/fail counts, success rate, performance) self (chainable)
show_failed_rows(max_rows=5) Displays sample of failed rows in console. Use max_rows=-1 for all rows. self (chainable)
display_full_report(max_rows=5) Prints summary + shows failed rows (convenience method) self (chainable)
save(output_dir=".", prefix="vowl_results") Saves enhanced CSV and summary JSON to disk self (chainable)
get_output_dfs(checks=None) Returns per-check failed rows as {check_id: DataFrame} Dict[str, DataFrame]
get_consolidated_output_dfs(checks=None) Deduplicates failed rows across checks, grouped by table Dict[str, DataFrame]
.passed (property) Boolean indicating if all checks passed True/False