Next, next, next, next, next… you know the drill.
Except that when installing SQL, it’s simply not good enough. The defaults might not work for you, and that makes this post qualify for this month’s T-SQL Tuesday, hosted by Andy Yun (@sqlbek).
Most things are fine, but there is one page which you really shouldn’t ignore. And it’s not even obvious. But if you’re just clicking Next, next, next, then you might easily miss it.
When you reach this page:
please make sure you don’t just hit Next without looking at the Collation tab. It looks like this:
Now, I don’t particularly care what you decide to put here. But you shouldn’t just set it blindly. If you are installing a new server and you need it to be consistent with what you’ve used before, go and look at the collation setting on your old instance. This is ridiculously important.
You see, people are lazy. And when I say people, I mean developers. And when I say developers, I mean bad developers. (There was a thing recently that said that “Women’s Soccer” should now be called simply “Soccer”. I agree with this. There shouldn’t have to be a differentiation between a game played by a particular type of people, except perhaps “American Football”, which is obviously called “football” because they use their feet so much. Oh right, about developers. I hope as time passes, developers become better. But for now, when I say “Developers”, I mean bad developers.)
So when a developer creates a temporary table, they do something like:
1 |
CREATE TABLE #resultset (col1 varchar(20) NOT NULL, col2 int NOT NULL); |
And then they populate it with some data, and then they use it in another query. Like this:
1 2 3 |
SELECT t.* FROM dbo.SomeTable t JOIN #resultset r ON t.code = r.col1; |
This is code that has worked for years. But if you have ignored the collation setting and the instance collation is different to the database collation, you will get an error. The system won’t know whether two strings are supposed to be the same or not. Is ‘abc’ the same as ‘àbç’, if one tells you to ignore accents and the other says not to? The system can’t decide. It’s even more complex than that, because two strings might be identical, but it won’t know how to look them up if the alphabet orders letters differently. It can’t decide and gives an error. It goes in the too-hard basket.
Of course, a nice responsible developer will have created the temporary table like this, and then the problem never occurs:
1 |
CREATE TABLE #resultset (col1 varchar(20) <strong>COLLATE DATABASE_DEFAULT </strong>NOT NULL, col2 int NOT NULL); |
But let’s face it – this is rare. Most people write database code without thinking about the collation settings, and that’s a problem. Most of us are bad developers.
The error you get is the one described here: https://connect.microsoft.com/SQLServer/feedback/details/324910/collation-error-behaviour-option. This is a Connect item which I raised in January 2008 (7.5 years ago), which has 47 upvotes, and which was closed in 2011 as “Won’t Fix”. It was looking likely for a while (at least when using tempdb), but then they must’ve realised it wasn’t particularly important. Ultimately, it could give unexpected results if you’re not confident about which language you’re using, and it’s potentially better to give an explicit error than to let your query work but give the wrong results.
1 2 3 |
SELECT * FROM dbo.Sports WHERE Name = 'football'; |
1 2 |
Msg 468, Level 16, State 9, Line 1 Cannot resolve the collation conflict... ;) |
This Post Has 10 Comments
That connect proposal is really horrible. What bad language design.
The real problem is that temp tables have tempdb collation. That is never useful. They should have the collation of the database that creates the table. This behavior is so bad and broken that it borders on a bug.
It certainly is *not* best practice to specify COLLATE DATABASE_DEFAULT on all temp tables. It clutters the code. This should be done if necessary. The better solution is to ensure that all databases have the same sane collation. If that is not possible you might need COLLATE DATABASE_DEFAULT.
I’m sorry you don’t like the Connect item, Mark. The point of the Connect item is to have an option that means that a collation error can resolve with a warning.
There is no issue specifying a collation when creating a temporary table, and DATABASE_DEFAULT is better than nothing.
I did not realize it was your item. I should have elaborated: Settings that influence the valid T-SQL that you can write or its behavior are evil because it’s a trap waiting to hit. You can no longer use code on the web or publish code to the web because you might accidentally depend on some setting. That’s why the ANSI NULL setting is deprecated although it clearly makes the language better. Bad language design.
It’s action from a distance. It breaks stuff without you being able to audit for it.
This is like the VB language options which I’m sure the team regrets.
Also, this is a little like ON ERROR RESUME NEXT.
You already can’t execute code and just assume it’s going to work. A different collation is a common cause of that. The Connect item is trying to increase the amount of code that will simply work, although of course indexing strategies need to consider the collation settings carefully.
It also will increase the amount of code that runs but has wrong results and the amount of code that depends and requires a certain database setting. In fact what if you want to run different pieces of code with different settings? Often a database hosts multiple apps or clients.
One nasty aspect I did not mention before. The ticket breaks the fact that (a=b) is identical to (b=a).
Again, the real fix is to not make temp tables depend on internals of tempdb. We have a clean fix, there is no need to add bad language design.
All code that created columns without an explicit collation is dependent on database settings already. My suggestion in the Connect item was to have a way of avoiding the error, and a few different options were provided. And the real fix is to have people always specify the collation explicitly, so that even the database collation isn’t relevant, let alone the tempdb collation.
"The real problem is that temp tables have tempdb collation. That is never useful. They should have the collation of the database that creates the table."
Agree with Mark here. The default behaviour should work for the 99% of cases. If I ship an enterprise produce internationally, the first problem encountered is restoring our base db onto a server with a different collation. When the customer supplies the db server, this is always a problem (in our case this is 100% of the time). The default collation for them is directly dependent on their machine locale. They have other systems on their db servers. Our application operation is directly dependent on the collation we have built our app upon and regression tested against. Having to specify database_default everywhere is poor design.
Yes, I agree with that aspect too. But that connect item is not just about tempdb, but anywhere that collation errors occur. System tables use the default instance collation, and we should take care to avoid errors when dealing with them too.
I’m sure you’re already aware, but it’s worth noting that this behaviour changes when you use a DB that is set for partial containment. Once set it defaults to creating temporary tables in the database default collation.
So while the primary reason for containment is security, and it’s a sledgehammer approach to contain a DB for this reason the option is there.
Yes, Trevor, and I really wish they had’ve developed the "contained database" concept more.