T-SQL Tuesday — A date dimension table with computed columns

December 8, 2009

Quite a few people have asked me to blog about what I do for a date dimension table. I’m talking about a table that Analysis Services references for a Time dimension. It’s going to contain every date in a particular range, and be flexible enough to cater for public holidays and other custom details.

There are plenty of options for this, and I’ll mention some of them a bit later. What I use most of the time is an actual table in the Data Warehouse, which I populate with a row for each date in the range I want to consider. This range starts well before the earliest date I could want, and I don’t leave gaps either. Some people like to only use dates that have fact data, but I prefer to have the dates going back as far as I like.

Let’s talk about what the table looks like, and then how it can be created.

I have a primary key on an integer based on the date, in the format YYYYMMDD. So today would have the number 20091208. I haven’t tried using the date type that’s available in SQL Server 2008 for a date dimension yet – I generally try to use numbers for dimension keys, and haven’t tested the alternative yet. Using an integer like this for the key in a date dimension is generally considered best practice.

I also have a column which is the actual date itself. I will use this as the Value column for the dimension key in Analysis Services. I also have various representations of the date in string form, such as “Tuesday December 8th, 2009”, “08/12/2009”, “8-Dec-2009”. One of these will be the Name column, but I may have others available for other properties and translations. A “12/08/2009” option may be preferable for a US translation, for example.

Columns in my table should indicate which year it is, such as 2009. I’ll also throw in the start of the year (in a date format), and something which indicates which Financial Year it is. In Australia, this is most easily handled by adding six months onto the current date and considering the year of this adjusted date (our FY starts on July 1st). I can subtract the six months back again to work out what the start of the Financial Year is. I try to keep things in the code quite simple, as I leave this code with the client and hope they can maintain it as required. The trickiest I get is to use the DATEADD(month,DATEDIFF(month,0,ActualDate),0) technique for truncation, but I think this should be required knowledge when handling dates.

For months, quarters, semesters, weeks, and so on, I will also prefer to have an integer as the key. A Month Key would take the format 200912 for this month, or 201001 for next month. Quarters can be done using 20094 and 20101, and so on.

This may all seem quite complex, but it’s something you only need to do one time.

Let me explain…

My table only really contains one field. Yes, just one. More might be required for custom fields, but where possible, I will just populate one field and let all the rest be handled using computed columns.

Even the primary key will be a computed column.

CREATE TABLE dbo.Dates (
DateKey AS CONVERT(int, CONVERT(char(8), ActualDate, 112)) PERSISTED NOT NULL
,ActualDate DATETIME NOT NULL
,CalendarYearKey AS YEAR(ActualDate) PERSISTED NOT NULL
,CalendarYearName AS CONVERT(char(4), YEAR(ActualDate)) PERSISTED NOT NULL
,CalendarYearStart AS DATEADD(year,DATEDIFF(year,0,ActualDate),0) PERSISTED NOT NULL
,FinancialYearKey AS YEAR(DATEADD(month,6,ActualDate)) PERSISTED NOT NULL
,FinancialYearName AS CONVERT(char(4),YEAR(DATEADD(month,6,ActualDate))-1) + ‘/’ + RIGHT(CONVERT(char(4),YEAR(DATEADD(month,6,ActualDate))),2) PERSISTED NOT NULL
,FinancialYearStart AS DATEADD(month,-6,DATEADD(year,DATEDIFF(year,0,DATEADD(month,6,ActualDate)),0)) PERSISTED NOT NULL
,MonthKey AS CONVERT(int, CONVERT(char(6),ActualDate,112)) PERSISTED NOT NULL
,MonthName AS CASE MONTH(ActualDate)
                 WHEN 1 then ‘Jan’
                 WHEN 2 then ‘Feb’
                 WHEN 3 then ‘Mar’
                 WHEN 4 then ‘Apr’
                 WHEN 5 then ‘May’
                 WHEN 6 then ‘Jun’
                 WHEN 7 then ‘Jul’
                 WHEN 8 then ‘Aug’
                 WHEN 9 then ‘Sep’
                 WHEN 10 then ‘Oct’
                 WHEN 11 then ‘Nov’
                 WHEN 12 then ‘Dec’
               END
                  + ‘ ‘ + CONVERT(char(4), YEAR(ActualDate)) PERSISTED NOT NULL
,FrenchMonthName AS CASE MONTH(ActualDate)
                 WHEN 1 THEN ‘janv’
                 WHEN 2 THEN ‘févr’
                 WHEN 3 THEN ‘mars’
                 WHEN 4 THEN ‘avr’
                 WHEN 5 THEN ‘mai’
                 WHEN 6 THEN ‘juin’
                 WHEN 7 THEN ‘juil’
                 WHEN 8 THEN ‘août’
                 WHEN 9 THEN ‘sept’
                 WHEN 10 THEN ‘oct’
                 WHEN 11 THEN ‘nov’
                 WHEN 12 THEN ‘déc’
               END
                  + ‘ ‘ + CONVERT(char(4), YEAR(ActualDate)) PERSISTED NOT NULL
–Many more columns following
);
GO

You will notice that I have used ugly long CASE statements for the MonthName columns. I do the same for the names of the days of the week. The reason is betrayed in the second example. DATENAME (or any kind of conversion that relies upon the language setting, such as CONVERT(char(3),ActualDate,100)) is non-deterministic, and therefore can’t be used in a persisted computed column (I do wish that CONVERT could take a Language setting, so that I could tell it to convert in English, French, etc, and make it deterministic). Why do I want them to persist? Well… I’m just more comfortable with them being persisted. After all, I could use a view for the whole thing at this stage, but I’m really not that comfortable with the table being generated on the fly when it comes to processing. The table is essentially read-only, after all.

As well as many computed columns like this, I will also have some that are not computed, such as a column to indicate if it’s a public holiday. This could be computed, at a push, as public holidays generally follow a system. Even Easter follows a formula that could be applied. But if the company takes a special day, or if government declares an extra day for some reason, then problems can start popping up. I find it convenient to have columns that can be updated directly (but which have defaults, of course).

One great thing about this method is that it can be populated very easily. The only field you insert data into is the ActualDate column. Generating a list of dates is as easy as using DATEADD() with a nums table, as I’ve written many times before, including this StackOverflow question. If you need more dates, just insert more.

As I mentioned before, a view could be used for this. It is very easy to generate a list of dates, and then all the other calculations could be done as other columns in the view. You could perform an OUTER JOIN into a table which lists public holidays and other special days. Analysis Services will happily handle this in much the same way. I just prefer to have it exist as a table, which I feel I have more control over.

This post has been part of T-SQL Tuesday, hosted this month by Adam Machanic. You should be able to see many other posts related to datetime mentioned as Trackbacks to Adam’s post.

This Post Has 2 Comments

phil nolan December 8, 2009 Reply

great article rob.

are there any perfomance downsides to this approach as you’re unable to index on the DateKey (which will be used in your fact and linked dimensions) as it’s a calulated column?
robfarley December 8, 2009 Reply

Hi Phil – These columns are all indexable, as they’re persisted. The functionality available from computed columns seems to improve with each new version, but indexing them if persisted has been around for a while now.

T-SQL Tuesday — A date dimension table with computed columns

This Post Has 2 Comments

Leave a Reply Cancel reply

LobsterPot Blogs

Related Blogs

Detecting changes

Presentations with impact!

Career risks?

Ageing code

Archive

Ready to Feel Good About Your Data? Let’s Start the Conversation.

Email

Call

Sitemap

Services