Comments on: The SQL Query Optimizer – when Logical Order can get it wrong

By: Chris Adkin

Chris Adkin — Thu, 03 Jan 2013 16:48:02 +0000

Rob,
You may be alluding to what the optimizer team refer to as the ascending key problem, whereby data gets added to a table in ascending key order, for large tables ( opposite of the ‘Small’ table you mentioned – I know ), this can be mitigated against via trace flag 2371. You may also want to look at connect item 676224, another popular database engine includes the ability for hints to be used on statements that specify:
1. That data in the relevant table(s) should be sampled in order to produce better plans.
2. How aggresively the sampling should be
Refer to connect item 676224

By: Rob Farley

Rob Farley — Wed, 02 Jan 2013 05:09:46 +0000

Ok Simon… how about in AdventureWorksDW, which has 60K records in dbo.FactInternetSales? I get it’s still not huge, but it shows that it’s easy to have a bad plan come out.

CREATE INDEX ix1 ON dbo.FactInternetSales(OrderDateKey) INCLUDE (UnitPrice);

CREATE INDEX ix2 ON dbo.FactInternetSales(UnitPrice) INCLUDE (OrderDateKey);

SELECT MIN(OrderDateKey)
FROM dbo.FactInternetSales
WHERE UnitPrice between 0 and 100;
–Prefers ix1. 20 reads

SELECT MIN(OrderDateKey)
FROM dbo.FactInternetSales
WHERE UnitPrice between 600 and 700;
–Prefers ix2. 2 reads.

SELECT MIN(OrderDateKey)
FROM dbo.FactInternetSales WITH (INDEX(ix2))
WHERE UnitPrice between 0 and 100;
–Forced ix2. 150 reads.

Of course, with a correlation between the two fields, it could be possible to show an example of ix1 being particularly nasty as well.

By: Simon Sabin

Simon Sabin — Tue, 01 Jan 2013 15:51:53 +0000

Having so few rows in the table make this a little contrite. The indexes only have 4 leaf pages to read. All data is found on one page and so its a question in both of scanning a single page for a value.
It would be helpful to show this with more data where the impact is significant.

By: Rob Farley

Rob Farley — Mon, 31 Dec 2012 03:16:34 +0000

Ian: Thanks

By: Rob Farley

Rob Farley — Mon, 31 Dec 2012 03:14:40 +0000

Martin: Yes – if it thinks the range is wide enough and low enough selectivity it can decide to go from both sides. But in that scenario you can solve it easily with a composite index, because the predicate is an equality.

By: Ian Yates

Ian Yates — Mon, 31 Dec 2012 03:09:40 +0000

Great post! I really liked the clear explanations I think I shall use them in real life when trying to explain some of this behaviour to others (or just point them to this blog)

By: Martin Smith

Martin Smith — Mon, 31 Dec 2012 00:37:06 +0000

Actually the link in my first post does show sometimes SQL Server will generate such a plan calculating the MIN and MAX separately (in that case it also has added lookups which make things worse.)

By: Rob Farley

Rob Farley — Sun, 30 Dec 2012 23:45:07 +0000

Your points are all valid, and again I’ll say that designing indexes around business knowledge is important. Many business scenarios would consistently be much closer to the best case, and it would be foolish to settle for the "least bad worst case" alternative.
Obviously the queries can be forced into either plan, once the developers have considered their options.
And your query example doesn’t work. If you need both MIN and MAX you’d need to approach from both ends, and your equality predicate causes a simple composite index most effective.

By: Martin Smith

Martin Smith — Sun, 30 Dec 2012 17:46:28 +0000

Just pointing it out as a potential issue to be considered.
In the real world there may be all sorts of correlations that don’t necessarily occur to one when writing queries.
SELECT MIN(OrderDate), MAX(OrderDate)
FROM Orders
WHERE ProductId = @ProductId
It is highly unlikely that productIds will be evenly distributed throughout the Orders table as new products get launched and old ones get discontinued for example.

By: Martin Smith

Martin Smith — Sun, 30 Dec 2012 16:53:27 +0000

Hi Rob,
I’m not saying that the scan plan is definitely worse (at least assuming perfect statistics) just that it can have more variable performance when the rows are not in fact exactly evenly distributed.
Say there are 1,000,000 rows. 1,000 match the seek predicate. Under the even distribution assumption SQL Server will assume that 1,000 (1 million / 1 thousand) rows need to be scanned.
For the seek plan the best, worst, and estimated case are all 1,000 rows
For the scan plan the best, worst, and estimated case are (1, 999,000, 1,000) and if the statistics are not perfect and in fact no rows match at all then the real worst case would be 1 million rows.
If the predicate is made less selective so 10,000 rows now match
For the seek plan the best, worst, and estimated case are all 10,000 rows
For the scan plan the best, worst, and estimated case are (1, 990,000, 100)