T-SQL Tuesday again and this month is on T-SQL Tips (thanks Allen!). In some ways it’s a tough topic, because there are things I don’t really consider tips that other people do, and vice-versa. This one’s legitimate though: Using CROSS APPLY for working columns in calculations.
Back in 2009, I wrote a blog post on Julian (YYDDD) dates. http://blogs.lobsterpot.com.au/2009/03/25/converting-to-and-from-julian-format-in-t-sql/
Someone asked me recently about using this for time periods, for example, someone’s age when they play a football match. For example, Cesc Fàbregas holds the record as the youngest Arsenal player, aged 16 years and 177 days. Given someone’s birthdate and date of interest, this seems like it should be quite easy to work out, but can get frustratingly tricky.
The idea is quite straight forward – you just count the number of days from their most recent birthday. So let’s have a think.
Counting the number of years between the startdate and the enddate is a nice place to start. I could count the number of days, but that might give me rounding error based on leap years. But counting the years simply compares the year component, such as 1987 and 2003, and the player might not have had his birthday this year yet. In that case, we’d want to go with the year before. Anyway – once we’ve figured out how old the player is, we can figure out when their last birthday was and count the days since then. Easy.
But so easy to make a mistake somewhere.
And this is where CROSS APPLY can come into its own, by allowing us to use working columns.
Let’s start with a few players. Some young ones, and Thierry Henry, just because. I’m using the VALUES method, but you could have a Players table just as easily.
1 2 3 4 5 6 7 8 9 |
select * from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) |
First, I want to make it clear which column is my startdate and which is my enddate. It’s too easy to be thinking “startdate” and pick up “Debut” here, because this sounds very much like the same thing. There’s the startdate of their playing career, and the startdate of the calculation. To avoid confusion, I’m going to do some simple column-renaming. This gives me more reusable code, and APPLY even means I never have to worry about whether I already have these column names referring to something else, because they’re going to get their own table alias too.
1 2 3 4 5 6 7 8 9 10 11 |
select * from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) CROSS APPLY (select p.DoB as startdate, p.Debut as enddate) as working |
Might seem like a bit of a waste to you, but it means so much to me. Really.
Now I want to count how many years there are between my startdate and enddate, and work out when the startdate is this year. I could do this in one step, sure, but I want to be able to check my working really thoroughly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
select * from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) CROSS APPLY (select p.DoB as startdate, p.Debut as enddate) as working CROSS APPLY (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd CROSS APPLY (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty |
Now I can easily test to see if I need to subtract a year or not.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
select * from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) CROSS APPLY (select p.DoB as startdate, p.Debut as enddate) as working CROSS APPLY (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd CROSS APPLY (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty CROSS APPLY (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo CROSS APPLY (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay |
I’m sure you can see where this is going. I now have their latest birthday before the date I’m looking for, and I can easily turn this into a Julian Date format.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
select p.*, jd.JulianDiff from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) CROSS APPLY (select p.DoB as startdate, p.Debut as enddate) as working CROSS APPLY (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd CROSS APPLY (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty CROSS APPLY (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo CROSS APPLY (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay CROSS APPLY (select (yd.YearsDifferent + yo.YearOffset) * 1000 as YearComponent) yc CROSS APPLY (select datediff(day,ay.AdjustedYear,working.enddate) as DayComponent) dc CROSS APPLY (select yc.YearComponent + dc.DayComponent as JulianDiff) as jd ; |
The great thing about this is that I can be very confident of my working, being able to check each step of the calculation along the way. I can even turn it into a simple TVF, whilst keeping the calculations just as verifiable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
create function dbo.YYDDD(@startdate date, @enddate date) returns table as return ( select jd.JulianDiff FROM (values (@startdate, @enddate)) working (startdate, enddate) CROSS APPLY (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd CROSS APPLY (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty CROSS APPLY (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo CROSS APPLY (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay CROSS APPLY (select (yd.YearsDifferent + yo.YearOffset) * 1000 as YearComponent) yc CROSS APPLY (select datediff(day,ay.AdjustedYear,working.enddate) as DayComponent) dc CROSS APPLY (select yc.YearComponent + dc.DayComponent as JulianDiff) as jd ) ; select p.*, jd.JulianDiff from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) CROSS APPLY dbo.YYDDD(DoB, Debut) jd; |
I know it’s really simple, but it’s an amazingly useful tip for making sure that you do things right.
Oh, and as for the plan – SQL just rolls it all up into a single Compute Scalar as if we’d written it out the long way, happily showing us that doing it step-by-step ourselves is no worse at all.
This Post Has 4 Comments
nice article. I always like to say that nothing we do as DBAs is really a rocket science but requires thinking none-the-less.
Nice article Rob but I think it would resonate with more people had you used David Beckham, Ryan Giggs, Cristiano Ronaldo and Wayne Rooney 😉
Thanks for the great post! I enjoyed the way you built it up from previous versions leaving the reader to see each building block. It’s how I try to do things at work too so it’s nice for me to confirm I’m on the right track!
Thanks again
Andy
This is a fantastic article! Thank you! However, do you have an example for when dealing with an existing table and some of the column values being calculated contain either NULL or ” ?