Covering, schmuvvering – when a covering index is actually rubbish

May 19, 2011

Take a look at this query plan.

image

Yes, that arrow indicates a single row. This is an Index Seek, returning a single row. And yet it’s rubbish. That’s right – it’s rubbish!

In fact, I had to provide a hint for it to use this index. A table scan would’ve been better, and this is what happens without the index hint.

image

Let’s look at the query. I can promise you there’s no bug here.

And the covering index is defined as follows:

Try it yourself on AdventureWorks, you should see the same.

Of course, there’s a bit of trickery going on. For example, I purposely spaced out the NCIX using FILLFACTOR=30. But despite that, most DBAs would tell you that the first plan is better than the second. But they’d be wrong.

Let’s look at the tooltip of each of those Index operations. First, the Clustered Index Scan.

image

There’s no real surprise here. We know there’s only one row being returned, I told you that before. There’s a predicate listed which is tested on every row, and the Name and ProductNumber are the two fields which are being outputted. Estimated Subtree Cost of 0.0127253.

How about that Seek then, which the Query Optimizer only chose when its hand was forced by a hint?

image

You’ll probably notice immediately that the Estimated Subtree Cost is higher! 0.0141 instead of 0.0127. The reason why is in the Predicate and Seek Predicate sections. Sure, it can do a Seek – it can seek on the first key column of the index, on DaysToManufacture. But having found those records, the data is still very much in DaysToManufacture order. We can’t search on those predicates very easily at all, and we’re left with them being handled as a Residual Predicate.

To use a phone-book example, this is like searching for people called “Rob F.” – I can easily find people whose last name starts with F, but would then have to go through all of them looking for people called Rob. That residual check could be painful. Actually, this particular example is more like saying “Find me anyone called Rob whose surname is < Z”, as the Seek Predicate involved returns most of the table.

Unfortunately, the query plan doesn’t show us how many rows go through the Residual Predicate check. If it did, we’d be able to see that it’s over 400 rows (in a table of 504 rows), and we might rethink our indexing strategy.

In fact, I can easily make an index which causes the same query to perform a full Index Scan, plus Lookup, and which is half the cost of either of the plans we’ve seen so far.

image

Of course, now I’m really cheating, by using a Filtered Index which would only contain a single row. And I could’ve eliminated the Lookup by including the other columns, halving the cost yet again.

image

Interestingly, the tooltips for both of these Filtered Index operations don’t show the predicates – the Query Optimizer knows that the index has already done the necessary filtering, and in this case, it doesn’t need to apply any further predicates, whether Seek or Residual.

image

So… just because an index is covering a query, doesn’t mean it’s necessarily the right choice. The better question is “Is this index good for this query?”

(Incidentally, this is the kind of thing that I’d go through in my pre-conference seminar at SQLPASS if chosen, and in the talk on “Joins, SARGability and the Evils of Residualiciousness” if that one gets picked)

@rob_farley

This Post Has 22 Comments

  1. John Sansom

    Very interesting!
    It caught me out dam it 🙂 It would seem then that covering is not always king.

  2. GrumpyOldDBA

    The problem with examples is that they are often not very representative of real world. I doubt very much if I’d even glance at a singleton select, seek, scan or table scan. It’s interesting I agree, it would make a good interview question but in the overall scheme of performance tuning of multiple table, multiple query procs a difference of .0014 is unlikely to make much difference. The only point i’d perhaps make, is that a clustered seek/scan returns the entire row wheras the covered idnex will only return the rows in the index – on a wide table this might or might not be significant – it might show in the cost of course.

  3. Rob Farley

    Colin,
    But that single-row returning Seek is hiding the truth. It’s closer to a scan than a seek really, because the Seek Predicate is so non-selective. It’s pulling back over 400 rows, applying a residual predicate to work out what’s left. Interestingly, a non-covering index may have shown this more obviously.
    I’m all for Covering Indexes that return a single row, but it’s important to recognise whether that row is being found by the Seek Predicate or the Residual Predicate.

  4. GrumpyOldDBA

    Oh yes I do agree and you’re right – I have example I use in presentations which show as a clustered index seek but actually perform a full scan – but it only becomes obvious if you check the number of page reads. And yes you’re right most would say a seek is better than scan, but if you saw either of these within a query plan involving a number of queries you’d probably not give them a second glance. ( compared to more complex queries ) I enjoy these types of posts so much because it makes you think about the basics, which is very important, so thankyou.

  5. Rob Farley

    🙂 You’re welcome, Colin.

  6. Ranjith

    Good point … but I see it as obvious based on the number of rows in the table. If the same test is done on a different table with millions of records then the result would be different.

  7. Rob Farley

    Hi Ranjith – which test do you mean? The principles I’ve shown here will apply no matter how big the table.

  8. TheSQLGuru

    1) have you updated ALL stats on that table with FULL SCAN?
    2) IIRC, AdventureWorks uses unicode character storage.  Try your query like this:
    SELECT Name, ProductNumber
    FROM Production.Product
    WHERE DaysToManufacture < 4
    AND ReorderPoint < 100
    AND Color = N’Red’;
    3) What are the estimated rowcounts for each test you did and how do they compare to actual rowcounts?
    Results from above?

  9. Rob Farley

    Hi Kevin,
    The estimated rowcounts are just fine. The issue is the ability to seek on more than the first predicate, which is not very selective – plus the fact that I’ve padded the index to make it larger than the CIX. I’m trying to demonstrate that just because a Seek operation returns a single row, that doesn’t mean that it’s efficient at all. It’s like there are two components to a Seek – the Seek part and the Residual part. If the Residual is doing most of the work, the Seek is not much different to a Scan.
    Rob

  10. Roshan Joe

    Rob,
    Really nice post.
    Got your point.
    One doubt, Among the three predicates, if we have put the more selective column first in the index, the QO might have opted index seek. right?

  11. AndrewJacksonZA

    I bumped up the fill factor for the orignal rf_ix_Covering to 100 and got an Estimated Subtree Cost of 0.0669. What are the fill factors of the other indices that you’ve used as examples?

  12. Rob Farley

    Roshan & Andrew,
    Yes, if the index were more selective, or packed more tightly, then it could well have been chosen, and rightly so. I was contriving an example to show that indexes that look appealing with Seeks and low number of rows returned could well be hiding a hidden cost, and that if the Seek Predicate is not particularly selective, then it could be little better than an Index Scan – as demonstrated by the fillfactor impact. I picked 30% thinking that despite being non-clustered (and therefore fewer columns at leaf level), this could make the range satisfied by the Seek Predicate require more reads than performing a CIX scan.
    So yes – it’s slightly contrived, but to demonstrate an important point about the cost of non-selective Seek Predicates, even on Covering Index Seeks that output a single row. Many people see an Index Seek with low output and no lookups as the goal of performance tuning, but it’s simply not the case.

  13. Rob Farley

    Oh, and Andrew – the other fillfactors were 100%, but as mentioned in the post, the filter index had only one row, so fillfactor isn’t going to be particularly relevant. I did admit to "a bit of trickery", after all (but only to demonstrate a valid point).

  14. gbn

    What if the index matches all WHERE conditions?
    CREATE INDEX rf_ix_Covering ON Production.Product (DaysToManufacture, ReorderPoint, Color)
    INCLUDE (Name, ProductNumber, Size)
    WITH (FILLFACTOR=30)
    or this with equality column first?
    CREATE INDEX rf_ix_Covering ON Production.Product(Color, DaysToManufacture, ReorderPoint)
    INCLUDE (Name, ProductNumber, Size)
    WITH (FILLFACTOR=30)
    The index above isn’t covering in the sense that I understand because filter/key columns don’t match the query. You are filtering unsorted INCLUDE columns

  15. Rob Farley

    Hi gbn,
    Yes, there are benefits to having the equality predicate’s field as the leading key. But that’s pretty much it. Even though the covering index looks good according to the plan, the fact that the Seek Predicate is so non-selective is a problem. I know there are better indexes available (the best being a filtered index), but I’m showing that just because you have a Seek on a Covering Index, you don’t necessarily have an effective index.
    Rob

  16. gbn

    The optimiser has just found an index that matches roughly and decided to use it.
    It isn’t really covering for this query, more "convenient".
    So in that sense, yes, the seek is useless because it is a scan.

  17. Phil

    But you admit that you had to provide an index hint to get the optimiser to use this misleading index, so what exactly are we proving here?  Yes, there are (artificcial) situations where a single-row seek can require more effort than is apparent at first glance, but since the optimiser doesn’t actually choose such options unless you make it I’m not sure how surprising the information is here?

  18. Rob Farley

    Hi Phil,
    There are plenty of situations where people try to force indexes to be used because they want a Seek that returns a single row, because they assume that it’s an ideal operation. That’s what I’m trying to challenge.
    Rob

  19. Rajesh

    Hi Rob,
    I have a table, having non cluster index say(id,date), but when I am running a query
    select id,date from table where id = 1 and date = ‘2010-09-09’
    its always performing index scan, while returning only one row..
    so why this is happening, when my index is selective only one row itis returning, then why scan also it involvs 98 logical redas..
    also this index is not unique, both id and date have duplicate values, but  this combination of date and id has only single row.

  20. Rob Farley

    Can you send me an email with the execution plan in it? I’m sure there must be something else going on, as I see little reason in what you’ve written for it to scan rather than seek. You can reach me at hotmail.com, as rob_farley.

  21. Greg Low

    That NC index is a pretty weak index though. The index itself (on DaysToManufacture) isn’t selective in the first place, regardless of whether or not the included columns make it cover the query.
    I suppose the main issue is how well the QO discerns what’s going on.

  22. Rob Farley

    The NC index is designed to be weak – that’s why I had to use a hint to get it to be used in the first case. But the point about having an ineffective index holds. People frequently kill the impact of a composite index by using an inequality, or apply a function so the index key can’t be used at all – and if there’s a leading key which can be used, an Index Seek which produces a small number of rows is seen. It looks good, but isn’t.
    But the impact of this can be seen using trace flag 9130, or now with later service packs on 2012. http://sqlblog.com/blogs/rob_farley/archive/2015/12/12/a-new-superpower-for-sql-query-tuners-number-of-rows-read.aspx

Leave a Reply

LobsterPot Blogs

Blog posts by Rob Farley and other LobsterPot Solutions team members.

Search