When you select data out of a database table, it might seem as if the system will give you data in the order of the clustered index that is on that table (assuming it has one), but this isn’t quite right. I showed this in the fifth of my T-SQL tips at TechEd Australia this year.
If you don’t explicitly order data using the ORDER BY clause, the system makes no attempt to provide you with the data in any particular order. It will just give you the data in the fastest way possible. You might be lucky and get the data in a predictable order most of the time. In fact, on your system, the data might be returned in a predictable order over 99% of the time. Unfortunately, you can’t rely on that.
When the database engine is handling your query, it needs to get your data into RAM first. If it’s not already there, this means a trip to disk, which will generally be the slowest aspect of your query. With multiple processors and multiple disks being the norm these days, there is a good chance that one processor will head off to get the data from one disk, while another processor gets data from another. If your query isn’t explicitly stating the order for your data, whichever processor gets its data first is likely to determine which data appears first in your results. There are other reasons why your data might come back in an unexpected order, but this reason alone should convince you.
I’m sure reading this, you’re very much used to putting ORDER BY in your queries. But what happens if you’re using SQL Server Integration Services (SSIS)? When you set up a data flow and want to pull data out of a particular table or view, do you use the “Table or View” option in the data source? You might – it’s a lot easier than typing your query out if you choose the SQL Command option. Just don’t. Use the SQL Command option instead.
The problem is that SSIS can really take advantage of ordered data. If SSIS can’t guarantee that the data is ordered, it will assume it’s not, and this might hurt the performance of your package significantly. Some data flow transformations can’t even run on unordered data. You can tell a data flow source that the data is ordered. You can even tell it that it is, even if it’s not. Don’t do this. Only tell a data flow source that it’s ordered if you’re explicitly ordering it. If you tell SSIS that a data flow is ordered, and the data comes through in a different order (because you’re unlucky one time), you will get unexpected results. What I mean by ‘unexpected’ here is ‘wrong’.
It’s not hard, just write out your query, and use the ORDER BY clause. You know you should…