Data quality tools: from cron scripts to Monte Carlo

Our Dagster instance broke while our founder was on holiday, and nothing told anyone for a day. Every check we had lived inside the system that had stopped, so when it stopped, they all went quiet together. The app stayed up, the dashboards kept loading — they just loaded stale numbers. That outage is why Alertee exists, and it's why we ended up evaluating many of the tools in this post.

It's the common shape of the problem: the job runs, the app is up, the data is wrong. The version most teams meet first is the nightly import that finishes green five days in a row while writing zero rows, because an upstream API changed its auth and the script swallowed the error. A customer notices before the team does.

That's usually the week someone goes shopping for a data quality tool, and the market fails them in a specific way. Half of it is enterprise observability platforms with "book a demo" where the price should be. The other half is open-source frameworks that assume you have a data platform team and an orchestrator to run them. If you're two to twenty engineers with a production Postgres or ClickHouse database, neither half is talking to you.

Strip the category away and the job is small. You need a query that asserts something about your data — last night's import wrote rows, payments are still being recorded, no region has gone quiet. You need it to run on a schedule, independently of the pipeline it's watching. You need an alert when it fails, and a person who owns that alert instead of watching it scroll past in Slack. Lineage graphs, ML anomaly models, column profiling across a thousand tables — those solve real problems, but at a scale most teams reading this haven't reached.

This post goes tool by tool: cron jobs, dbt tests, Great Expectations, Soda, Monte Carlo, Metaplane, Bigeye, and Alertee, which we build. There's a comparison table at the end, and a recommendation based on what your data needs — not on how many engineers you have, because the two don't track each other. If you'd rather sort the options by when validation runs — at write time, at transform time, or continuously in production — the data validation tools post does that instead.

Three questions

Feature lists in this category blur together, so we compare on three questions instead.

How does the cost grow? Open source is free until you count the hosting and the engineer who maintains it. Commercial monitoring platforms are usually priced per data source or per monitored table, so the bill tracks the size of your warehouse, not the size of your team.

What do you keep if you leave? A check written in plain SQL runs anywhere — cron, a different vendor, a script. A check expressed in a tool's own YAML dialect needs translating. An ML monitor trained on your tables isn't portable at all; the value lives in the vendor's model.

Can you read the query that fired the alert? When a tool says your data is wrong, you want to see exactly what it asked the database. A check you can read is a check you can debug and tune. An alert that says "row count anomaly detected" makes you reverse-engineer the detector before you can even start on the incident.

The table at the bottom scores every tool on these three.

The tools

Cron jobs and SQL scripts

The real incumbent isn't a vendor. It's a script on a cron that runs a query and posts to Slack when the number looks wrong. It's free, it's completely transparent, and most teams evaluating this category already run a version of it.

It fails slowly, in known ways. Checks scatter across repos and crontabs until nobody can list what's monitored. Alerts land in a channel where they belong to everyone, which means no one. One noisy check teaches the team to skim past the channel, and the day they skim past a real failure is the day the setup quietly stops being monitoring. Five checks and a disciplined team: fine. Forty checks across three on-call engineers: not fine — but it's the right place to start, and the right way to prove the habit is worth keeping before paying for anything.

dbt tests

If your data already flows through dbt, its tests are the cheapest coverage you'll ever add. The generic ones (unique, not_null, accepted_values, relationships) are a few lines of YAML; singular tests are plain SQL files that fail when they return rows. They version with your code and run in CI. If you're on dbt, use them — there's no reason not to.

The limit is structural: dbt tests run when dbt runs. They check the transform, at transform time. If the orchestrator dies, the tests die silently with it — that's the outage at the top of this post. dbt also never sees tables it doesn't touch, like the application database your backend writes to directly. So pair the tests with something that watches production between runs.

Great Expectations

The most established open-source framework. You define "expectations" in Python — values in a set, null rates below a bound, row counts in a range — and GX validates data against them and generates documentation of what's covered. The expectation library is broader than anything you'd hand-write.

The cost is operational. GX Core is a framework, not a service: you write Python to configure it, host it, schedule it with Airflow or similar, and wire the failures into alerting yourself. With a data platform team and existing orchestration, the library earns its weight — especially if your team prefers Python to raw SQL. Without one, standing up GX is the project. GX Cloud removes the hosting and adds commercial pricing.

Soda

Soda Core is open source; Soda Cloud is the commercial layer. Checks are written in YAML — declarative and readable, but one syntax further from the SQL you already know. Soda Core is a CLI you still have to schedule yourself; the scheduling and alerting live in Cloud.

The product is framed around contracts, governance, and audit trails. If part of your job is showing auditors what's validated and letting non-engineers define checks — common once data quality becomes a compliance and stakeholder problem, not just an engineering one — that framing is genuinely useful. If you just want to know when the payments table stops receiving rows, you're learning a DSL for something SQL already says.

Monte Carlo

The reference enterprise platform. Connect your warehouse and it profiles tables, learns their normal behavior, and raises ML-driven monitors for freshness, volume, and schema changes, with lineage to trace a failure to its source. For a data org with hundreds or thousands of tables, that automation is the entire point — nobody is hand-writing checks at that scale — and Monte Carlo does it well. At that scale it beats everything else on this page, including Alertee.

Two costs. The visible one: quote-based pricing aimed at enterprise budgets. The subtler one: monitors are learned, not written, so an alert is a model's opinion you interpret rather than a query you read. The lock-in follows the same shape — trained monitors and lineage metadata don't leave with you.

Metaplane

Metaplane — now "Metaplane by Datadog" after the acquisition — is the lighter automated option. It connects in minutes, applies anomaly detection to row counts, freshness, and distributions, and includes column-level lineage. Its center of gravity is the warehouse stack: Snowflake, BigQuery, Redshift, Databricks, with dbt and BI integrations throughout. Postgres and ClickHouse connections exist but aren't the focus. If you want broad automated coverage without enterprise procurement — and especially if Datadog is already in your stack — it's worth a trial.

One thing to know before that trial: Metaplane's own site says alerts arrive "in as soon as 3 days", because learned monitors need history before they can call anything abnormal. A written SQL check fires on its first run. That's the trade across all the automated tools: statistical detection notices deviations you didn't think to check for, and can't assert a rule you know must hold — "no order references a deleted customer" isn't an anomaly, it's a fact to verify.

Bigeye

Similar shape to Metaplane — automated metrics, anomaly detection, lineage — but aimed up-market, with quote-based pricing. Evaluate it alongside Monte Carlo if you're an enterprise buyer; it's rarely the right first tool for a small engineering team.

Alertee

Alertee is the one we make. We built it after the Dagster outage, for the gap between uptime monitoring, which tells you the database is reachable, and the platforms above, which are priced and shaped for data organizations.

It runs scheduled SQL checks against production Postgres and ClickHouse. You describe the thing that should keep happening in plain English, it generates the SQL against your actual schema, and you review it before it ever runs. The generated SQL is ordinary SQL — edit it, replace it, or skip the generation and write your own:

-- The import job ran, but did it write anything?
SELECT COUNT(*) AS rows_loaded
FROM listings_import
WHERE loaded_at >= CURRENT_DATE;
-- Alert if 0 after 06:00

The inbox is what separates it from the cron setup it replaces. Each failure becomes an incident a person acknowledges, classifies, and resolves — and classifying an incident as noise tunes the check, instead of training the team to ignore a Slack channel. There's a CLI and MCP support for wiring checks into agent workflows. Pricing is on the pricing page.

What it doesn't do: no lineage, no learned anomaly models, and it monitors Postgres and ClickHouse, not Snowflake or BigQuery. If you need those, the platforms above are the right aisle — the head-to-head comparisons go deeper on each. The team it's built for is the one we described at the top: 2–20 engineers on a production database, who want readable checks running continuously without hosting the scheduler, alerting, and history themselves.

Comparison table

Tool	How cost grows	What you keep if you leave	Can you read the query?
Cron + SQL scripts	Free; paid in maintenance time	Everything — it's your code	Yes — you wrote it
dbt tests	Free with dbt	Tests are SQL/YAML in your repo	Yes
Great Expectations	Free OSS + hosting and eng time; GX Cloud paid	Python suites, portable with effort	Partly — expectations compile to queries
Soda	Free Core (self-scheduled); Cloud paid	YAML check files	Partly
Monte Carlo	Quote-based, enterprise	Little — learned monitors, lineage metadata	Rarely — monitors are ML-driven
Metaplane	Free tier; paid plans	Little — learned baselines	Partly — metadata-level detection
Bigeye	Quote-based, enterprise-leaning	Little	Partly
Alertee	See pricing	Every check is plain SQL you can take to cron	Yes — every check is inspectable and editable

What we'd pick, by what your data needs

Headcount is the wrong axis for this decision. A two-person team taking payments through Postgres has more riding on its data than a fifty-person team with one internal dashboard. What matters is which of three jobs you're hiring the tool for — and most teams need the first job covered long before the other two exist.

Simple checks on a production database. The outcomes users would notice: the import wrote rows, payments are recorded, no tenant has gone quiet, the report is fresh. Every team with a production database has this job from day one, whatever its size. A cron job and a handful of SQL scripts is a legitimate answer while the checks fit in one person's head. The moment they don't — checks in three repos, alerts in a channel nobody owns — this is what we built Alertee for: the same plain SQL, with the scheduling, alert ownership, and history handled. dbt tests don't apply here unless the table flows through dbt, and the automated platforms are solving a different problem at a different price.

Pipeline and transform testing. You have models, an orchestrator, and transform logic that can silently go wrong. If you're on dbt, its tests are the obvious pick — free, versioned, in CI. Great Expectations earns its setup cost if you have a platform team and prefer Python; Soda fits when checks need to be legible to auditors and non-engineers. But all three run inside the system they're testing, and the failure that burned us — Dagster down for a day, every check down with it — is invisible from inside. Whichever you choose, pair it with one independent check that watches the output tables from outside the pipeline. A freshness check on the table the pipeline is supposed to update fires whether the orchestrator is healthy, broken, or quietly skipping runs.

Automated monitoring at warehouse scale. Hundreds or thousands of tables in Snowflake or BigQuery, too many to hand-write checks for, and a budget line for data tooling. This is Monte Carlo's home ground, and at enterprise scale it's the strongest option on this page. Metaplane is the lighter way in — faster to connect, free tier, and a natural fit if Datadog already runs your infrastructure monitoring. Bigeye belongs in the same evaluation as Monte Carlo. Alertee doesn't compete for the warehouse itself — it doesn't connect to Snowflake or BigQuery. But most warehouses are fed by a production Postgres, and the learned monitors upstream won't assert the rules you know must hold there. That job stays the first job.

If the first job is the one you recognized, try it: connect a Postgres or ClickHouse database, write or generate one check, and you'll have one owned alert running before the next nightly import. The comparisons are there if you're still weighing the platforms.