The import finished green for five days while writing zero rows
When the failure is a row that never came, only a query running outside the pipeline catches it.
NiallOur Dagster instance broke while our founder was on holiday. Nothing told anyone for a day. The app stayed up, the dashboards kept loading their stale numbers, and the checks we had all lived inside the system that had stopped — so when it stopped, they went quiet with it. We found out the way you don't want to: late, and after a user had already noticed. That outage is why we built Alertee, and it's the cleanest way we know to explain why monitoring is a separate purchase from validation. Alertee is ours, so take the recommendation with that in mind — but the distinction holds whatever you end up buying.
The two get shelved together because they share a word. They don't share a job. A validation suite — dbt tests, Great Expectations, schema assertions — checks that the rows which arrive are correct: this customer_id exists, this status is one of four values, this amount isn't negative. It runs when something invokes it, against the rows in front of it. The failures that reach your users are a different shape. The nightly import finished green five days running while writing zero rows because an upstream API changed its auth and the script swallowed the error. One region stopped recording orders after a deploy. A tenant went quiet. None of those is an incorrect row. Each is a row that should exist and doesn't — and a check that waits for a row to inspect has nothing to inspect when the row never comes.
Most teams shopping for monitoring are after that second thing without a clean word for it. They have a validation story already — the tests pass — and a production scar the tests didn't prevent. This post is about that second purchase: what continuous monitoring against the live database actually requires, and which tool fits where you are. For the validation layers themselves — write-time constraints, transform-time tests, the suites that own that job — the data quality monitoring pillar walks them moment by moment, and the tools comparison goes vendor by vendor. We won't re-litigate them here.
What would have caught the zero-row import
Nothing inside the pipeline could have. Every check that ran inside Dagster stopped when Dagster stopped — they went silent together, which is exactly why no one heard about it. The import wrote no rows, so a validation suite had nothing to inspect; it can't flag a row that never arrived, by construction. The only thing that catches a missing row is a query that runs against the production database on its own clock — every few minutes, independent of whatever was supposed to write the data. SELECT count(*) FROM imports WHERE imported_at > now() - interval '25 hours', run from outside the import job, would have alerted on day one instead of day five. That independent schedule is what separates monitoring from validation, and three capabilities follow from it.
A query you can read, not a learned model. A monitoring tool tells you something in production is wrong. The next question is always what did it ask the database? If the answer is a query you can read, you can debug and tune it in the same breath. If the answer is "row count anomaly detected" from a model trained on your tables, you reverse-engineer the detector before you can even start on the incident. Both have a place — statistical detection catches deviations you didn't think to write down — but for the failures here (the import wrote zero rows, payments stopped recording), the rule you want to assert is one you already know. A query states it directly. "No order references a deleted customer" isn't an anomaly; it's a fact, and you'd rather see the SQL.
Schema awareness at setup. The reason teams put off monitoring is the blank page. Writing the first check means recalling which table holds payments, what the timestamp column is called, how soft-deletes are flagged. A monitoring tool that reads your schema can do most of that for you — propose the check, show you the SQL it generated against your actual columns, and let you correct it. That's the difference between a setup that takes an afternoon and one that takes a quarter and never finishes.
An incident inbox, because alert fatigue is how monitoring dies. This is the capability people underweight on a shortlist and regret first. Monitoring setups rarely die from a missed failure. They die from a team that learned to skim past the channel. A check fires at 3am for a quiet Sunday that was always quiet; a deploy trips a threshold for ninety seconds; a backfill looks like a flood. After enough of those, the real alert scrolls by with the noise. What fixes it isn't a quieter tool — it's a failure that becomes an owned item a specific person acknowledges, classifies as real, transient, expected, or noisy, and resolves, so the next instance of that noise tunes the check instead of training the team to ignore it. A shared Slack channel has none of that. An alert that belongs to everyone belongs to no one.
Where the missing-row case falls on the monitoring spectrum
The tools that can do continuous production monitoring run from heavy to light, and the zero-row import sits at the light end — it's a rule you can name and write down as a query, not a deviation you need a model to learn. Heavy observability platforms learn your data's behaviour and raise alerts when it drifts; light tools run the SQL you already wrote. Which end fits you is set by how much data you have and how much of it you can still reason about by hand — not by headcount. The named, query-able failures (import wrote no rows, payments stopped, a region dropped off) live at the light end; the heavy end is for the warehouse too large to enumerate checks for.
The heavy end: Monte Carlo. Connect your warehouse and it profiles tables, learns their normal freshness and volume, raises ML-driven monitors when something deviates, and traces a failure back through lineage. For a data organization with hundreds or thousands of tables, that automation is the entire job — nobody is hand-writing checks at that scale, and Monte Carlo is built precisely for it, with the enterprise framing and quote-based pricing that go with it. At that scale it's the strongest option, and we'd point you there without flinching. The trade is the one above: monitors are learned, not written, so an alert is a model's opinion you interpret rather than a query you read, and the trained monitors and lineage don't leave with you. Metaplane (now part of Datadog) and Bigeye sit near this end too — lighter and faster to connect than Monte Carlo, still warehouse-centred on Snowflake and BigQuery, still detection you tune rather than rules you state. A learned monitor also needs history before it can call anything abnormal, so the first useful alerts can take days to arrive.
The light end: cron jobs, and Alertee. The honest incumbent here is a script on a cron that runs a query and posts to Slack. It's free, it's entirely yours, and it works — right up to the point where checks scatter across repos until nobody can list what's monitored, there's no history of past failures, and the alerts land in that everyone-channel. The query was never the hard part. Operating it is. We built Alertee for that gap: it runs scheduled SQL checks against production Postgres and ClickHouse, reads your schema so you can describe the outcome that should keep happening in plain English and review the generated SQL before it ever runs, and turns each failure into an inbox item someone owns and classifies. Plain English is enough to start; the SQL stays editable. What it deliberately doesn't do: no lineage, no learned anomaly models, and no Snowflake or BigQuery — it watches the production database your application writes to, not the warehouse downstream.
The two ends aren't competing for the same buyer. Monte Carlo is for the warehouse you can't hold in your head. The light end is for the production database you can — where you know which rows should be there and want a query to confirm they are. Most teams meet the second problem years before the first, and a production Postgres usually feeds the warehouse the heavy tools watch, so the rules you know must hold there stay your job either way.
How to tell which one you're buying
Two questions settle it, and the zero-row import answers both the same way.
Can you name the checks you want? If you can finish the sentence "alert me when ___" — the nightly import wrote no rows, payments stopped recording, a region went quiet, this report went stale — you want the light end. You already know the rules; you need them run continuously, owned, and kept out of the noise. That's the catch-it-when-it-breaks job, and it's where a cron script is the floor and Alertee is the cron script with an inbox and history bolted on. The green-but-empty import is precisely this case: one query against the live table, on a schedule the import can't take down with it.
Or is the problem that there are too many tables to name checks for? Thousands of warehouse tables, no one person who knows what normal looks like for each, a budget line for data tooling — then you want the heavy end, where the tool learns normal so you don't have to enumerate it. That's Monte Carlo's ground, and Bigeye's and Metaplane's near it.
What both have in common, and what neither shares with a validation suite, is the schedule: they look at production on their own clock, so they can see the row that never came. That's the line between monitoring and validation, and it's why "we already have tests" never answered the question. If your shortlist is the catch-it-when-it-breaks kind and the checks are queries you could write down today, connect a Postgres or ClickHouse database and turn one on — that's the case we built for. If it's the warehouse-scale kind, the platforms above are the right aisle, and the tools comparison weighs them in depth.