Counting consecutive instances with DENSE_RANK

July 25, 2009

It’s hard to think of a practical use for DENSE_RANK(), but here’s one.

It’s one of the ranking functions that was introduced in SQL Server 2005, along with RANK(), NTILE(n) and the incredibly useful ROW_NUMBER(). But apart from producing reports, it’s quite hard to find genuine uses for some of them.

Yesterday I answered a question at StackOverflow (first time — I often answer questions at Experts Exchange and on the MSDN Forums, but never at StackOverflow before). It’s at http://stackoverflow.com/questions/1176011/sql-to-determine-minimum-sequential-days-of-access/1176255#1176255

The problem is trying to find runs of consecutive days of activity. I solved it using ROW_NUMBER() and the integer value of the day, grouping by the offset between them. When you partition the ROW_NUMBER() by the users, you have a complete solution. It works nicely.

But if people can have multiple entries in a day, then ROW_NUMBER() doesn’t cut it — the offset would change whenever someone didn’t have exactly one record per day. Of course, there might be logic in place to make sure this never happens, and I hope there would be — but if there isn’t the problem just becomes one for DENSE_RANK().

You see, DENSE_RANK() won’t go up for ties. If you have two entries on the same day, they’ll be given the same rank. And then the next day will be as if there had only been one the previous day. If you have ten 107ths, the next DENSE_RANK() is 108 (RANK() would give the next value as 117).

So then my query of:

with numberedrows as
(
        select row_number() over (partition by UserID order by CreationDate) – cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
        from tablename
)
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset;

…becomes:

with numberedrows as
(
        select dense_rank() over (partition by UserID order by cast(CreationDate-0.5 as int)) – cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
        from tablename
)
select min(CreationDate), max(CreationDate), datediff(day, min(CreationDate), max(CreationDate)) + 1 as NumConsecutiveDays, count(*) as NumEntries, UserID
from numberedrows
group by UserID, TheOffset;

Bear in mind that I’m having to truncate the CreationDate here. If the values were stored as an date with no time component, I’d be able to use CreationDate without modification, which lends itself much better to indexing strategies. For this scenario, I’d prefer to have an index on a computed column which was the truncated date.

So there you go — a practical use for DENSE_RANK().

This Post Has 2 Comments

  1. Jason Strate

    As I mentioned last night, very nice. Good use of the DENSE_RANK.

Leave a Reply

LobsterPot Blogs

Blog posts by Rob Farley and other LobsterPot Solutions team members.

Search