Firstly let’s see the differences between below two queries. All of these two queries are to count 1,000,000 rows and insert into a temporary table:
The first query:
; with cte as
(
select 1 as n
union all
select n + 1
from cte
where n < 1000000
)
select * from cte option(maxrecursion 0)
go
The second query:
;with tally as
(
select ROW_NUMBER() over (order by (select null)) as n
from (values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as a(n)
cross join
(values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as b(n)
cross join
(values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as c(n)
cross join
(values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as d(n)
cross join
(values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as e(n)
cross join
(values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as f(n)
)
select * from tally
go
If we turn on Statistics IO and Statistics Time, we will see these two queries make a big difference (result can be various depends on the environment when you run the queries)
Statistics IO - CTE
Statistics IO - Tally table
Statistics Time - CTE
Statistics Time - Tally table
Actually, in a production environment I would always create one single column table with sequence number populated. This kind of number table can do lots of help when you need to do certain loop task. But in a real world scenario, we do have ad hoc situations, for these situations, tally table will be the friend.
Here is a real scenario when applying tally table. Imagine we have an aggregated purchase history table, in which we have customer details and purchased item amount. Now we need to print out customer details with their purchased item row by row – that is, if purchased amount is X, we need to print out X rows with the item sequence 1 through X.
To demonstrate, run below script to have the sample data
declare @t table
(
Amount int,
Fname varchar(50),
LastName varchar(50)
)
insert into @t
values
(3, 'J', 'Smith'),
(2, 'M', 'Tse'),
(5, 'I', 'Mila'),
(1, 'C', 'Qi')
For the table above, we want final output looks like:
Now think about what we can do:
- Cursor: well, you know you don’t want to touch it.
- While loop: essentially it is a cursor
- Recursive CTE: Yes it works
- And Tally table: a much better solution.
I won’t go through cursor and while loop, but below is recursive CTE and Tally table solutions. When the size of table becomes large, you can try these two approaches and see how big the difference it could be.
CTE
;with t1 as
(
select Amount, Fname, lname, ROW_NUMBER() over (partition by fname, lname order by amount) cnt
from @t t
)
, t2 as
(
select amount, Fname, LName
from t1
where cnt = 1
union all
select amount - 1, Fname, LName
from t2
where Amount > 1
)
select * from t2
order by 2, 3
Tally table
;with tally as
(
select ROW_NUMBER() over (order by (select null)) as n
from (values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as a(n)
cross join
(values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as b(n)
cross join
(values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as c(n)
cross join
(values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as d(n)
cross join
(values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as e(n)
cross join
(values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) as f(n)
)
select tt.n, t.Fname, t.LName
from @t t left join tally tt on t.Amount >= tt.n
order by 2, 3
So now we see how tally table can help us in a real business scenario. Enjoy the script :)