Master Recursive CTEs with Real-World Examples
πHey, I'm Nishant Chandravanshi. I've spent over a decade working hands-on with data β from writing SQL queries and building Power BI dashboards to orchestrating ETL pipelines. If you're someone who solves real-world data engineering or reporting problems, I've probably been in your shoes.
π―Today, I want to share a simple but powerful SQL trick that I've personally used in production β whether it was troubleshooting missing sales records, identifying log gaps, or validating time-series data.
Let's say you have a sales table that records transactions for specific dates. But you suspect some dates are missing.
Here's your starting dataset:
-- Drop if already exists DROP TABLE IF EXISTS #SalesDates; -- Create and insert data CREATE TABLE #SalesDates ( SaleDate DATE ); INSERT INTO #SalesDates (SaleDate) VALUES ('2024-01-01'), ('2024-01-02'), ('2024-01-04'), ('2024-01-06');
-- Missing Dates Output So we're missing: 2024-01-03 2024-01-05
π― Goal: Return only those missing dates between the minimum and maximum dates in the table.
Perfect for dynamic date ranges and ad-hoc queries. Uses SQL Server's recursive capabilities to generate date sequences.
Efficient for larger datasets. Uses a numbers table approach to generate consecutive dates quickly.
WITH DateRange AS ( SELECT MIN(SaleDate) AS StartDate, MAX(SaleDate) AS EndDate FROM #SalesDates )
-- Output of DateRange CTE StartDate = 2024-01-01 EndDate = 2024-01-06
We now know the date window we want to scan through.
, AllDates AS ( SELECT StartDate AS MissingDate FROM DateRange UNION ALL SELECT DATEADD(DAY, 1, MissingDate) FROM AllDates CROSS JOIN DateRange WHERE MissingDate < EndDate )
SELECT StartDate
begins the sequence at the first date.DATEADD(DAY, 1, MissingDate)
adds one day every time.EndDate
.So internally, the recursion works like this:
-- Output: Recursive Iteration of Dates Iteration MissingDate 1 2024-01-01 2 2024-01-02 3 2024-01-03 4 2024-01-04 5 2024-01-05 6 2024-01-06
CROSS JOIN DateRange
?You might ask:
"Why are we doing a CROSS JOIN
here? There's no ON clause!"
π‘ Answer: We do this only to make the EndDate
available inside the recursive part. Without this CROSS JOIN
, SQL Server will throw an error like:
β "Invalid column 'EndDate'"
Recursive CTEs cannot refer to outside CTEs unless you explicitly bring them in via a join β even if it's a one-row constant table like DateRange
.
SELECT MissingDate FROM AllDates WHERE MissingDate NOT IN (SELECT SaleDate FROM #SalesDates) ORDER BY MissingDate OPTION (MAXRECURSION 1000);
-- Output: Final Missing Dates MissingDate ------------ 2024-01-03 2024-01-05
Think of this like taking school attendance:
-- Generate sequence of dates between MIN and MAX SaleDate using a tally table WITH DateRange AS ( SELECT MIN(SaleDate) AS StartDate, MAX(SaleDate) AS EndDate FROM #SalesDates ), Tally AS ( SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 AS n FROM master.dbo.spt_values -- or use any large enough system table ), AllDates AS ( SELECT DATEADD(DAY, t.n, dr.StartDate) AS MissingDate FROM Tally t CROSS JOIN DateRange dr WHERE DATEADD(DAY, t.n, dr.StartDate) <= dr.EndDate ) SELECT MissingDate FROM AllDates WHERE MissingDate NOT IN (SELECT SaleDate FROM #SalesDates) ORDER BY MissingDate;
WITH DateRange AS ( SELECT MIN(SaleDate) AS StartDate, MAX(SaleDate) AS EndDate FROM #SalesDates )
π― Purpose: Extracts the starting and ending date from your data so we can generate a complete date sequence between them.
Tally AS ( SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 AS n FROM master.dbo.spt_values ),
π― Purpose: This creates a list of numbers from 0 to 999 using ROW_NUMBER()
. This is called a tally table or numbers table, commonly used to generate sequences (like dates, IDs, etc.) in SQL.
n -- 0 1 2 ... 999
AllDates AS ( SELECT DATEADD(DAY, t.n, dr.StartDate) AS MissingDate FROM Tally t CROSS JOIN DateRange dr WHERE DATEADD(DAY, t.n, dr.StartDate) <= dr.EndDate )
π― Purpose:
n
in the tally table, we add n
days to the StartDate
.CROSS JOIN
is used so we can access dr.StartDate
and dr.EndDate
inside this expression.π Example: If StartDate = '2024-01-01'
and EndDate = '2024-01-06'
, then:
n | DATEADD(DAY, n, StartDate) |
---|---|
0 | 2024-01-01 |
1 | 2024-01-02 |
2 | 2024-01-03 |
3 | 2024-01-04 |
4 | 2024-01-05 |
5 | 2024-01-06 |
6 | 2024-01-07 β (filtered by WHERE clause) |
SELECT MissingDate FROM AllDates WHERE MissingDate NOT IN (SELECT SaleDate FROM #SalesDates) ORDER BY MissingDate;
π― Purpose:
#SalesDates
.MissingDate ------------ 2024-01-03 2024-01-05
Step | CTE Name | Purpose |
---|---|---|
1 | DateRange | Get min and max dates from existing table |
2 | Tally | Generate numbers from 0 to 999 |
3 | AllDates | Add numbers to StartDate to generate all dates |
4 | Final SELECT | Filter out dates already present in #SalesDates |
β Yes, you could use a calendar table or numbers table if your system has one. But recursive CTEs are powerful for ad-hoc or one-time queries, especially when:
Method | Best For | Pros | Cons |
---|---|---|---|
Recursive CTE | Small date ranges, ad-hoc queries | No external dependencies, dynamic | Performance degrades with large ranges |
Tally Table | Larger date ranges, frequent use | Better performance, scalable | Relies on system tables |
Calendar Table | Production systems, enterprise | Best performance, feature-rich | Requires pre-built table |
π οΈAs a data engineer working across industries, I've seen small things like missing dates cause major reporting issues β from sales mismatches to incorrect dashboards.
π‘Knowing how to build a simple recursive CTE gives you an on-demand solution without any external dependency.
πI hope this gave you clarity β not just about how it works, but why it works.
π Ready to level up your SQL skills? Start implementing these patterns in your next data project!
If you have any questions related to this topic, feel free to comment below!
π§Found this helpful? Share it with your data team!
πWant more SQL tips? Follow for weekly data engineering insights!