Quite often, people have filtered indexes but find they’re not being used as often as they’d like. I was reminded of this recently when I read a good post by Kendra Little from brentozar.com about filtered indexes. In it, Kendra talks about how the WHERE clause of a filtered index allows an IN clause, but not an OR clause (to quote someone from Microsoft “We currently do not support the "OR" clause due to the optimizer matching difficulties for such clauses”). Going back a few years, Tim Chapman wrote a post about the pains of filtered indexes.
Anyway, both of these posts hint that filtered indexes aren’t always used. Tim addresses it directly, and Kendra mentions needing a hint to make sure the index is used. I thought I’d explore the question a little more. I’m confident that both Tim and Kendra know this information – they are two of the top index experts in the world. This post isn’t for them, but for those people who are trying to find out why their filtered indexes aren’t being used.
To be used, the filtered index must be able to satisfy the query.
This should be fairly obvious, but it goes a little deeper than you might think on first glance. Let’s explore Tim’s examples to show what I mean.
Tim creates a filtered index:
1 2 3 |
CREATE INDEX FIDX_SalesOrderDetail_ProductID ON Sales.SalesOrderDetail (ProductID) WHERE ProductID = 870; |
…and then shows that it’s used successfully for the query:
1 2 3 |
SELECT ProductID FROM Sales.SalesOrderDetail WHERE ProductID = 870; |
(The image here is from Tim’s blog post, and belongs to Microsoft)
No surprise here – if the system knows that ProductID is 870, then it can use an index which only includes rows that satisfy that.
Tim then tries to use a variable instead of 870 – although he still passes in the value of 870.
1 2 3 4 5 6 |
DECLARE @ProductID INT; SET @ProductID = 870; SELECT ProductID FROM Sales.SalesOrderDetail WHERE ProductID = @ProductID; |
(Image from Tim’s blog again)
No luck – the system doesn’t use the filtered index. He can’t even use a hint to force it to use it – SQL replies saying that it can’t create a plan for it using that index hint.
So what’s going on?
The problem is not with the Query Optimizer seeing that the value is going to be 870 – the problem is with the plan cache. You see, when SQL runs a query, it figures it won’t be in isolation and it puts it into the cache. But the version that goes into the cache is a general one, that doesn’t consider the values that are passed in. Because it needs a plan that will work regardless of what the parameter is set to, using the filtered index here would be inappropriate.
Tim shows one way around this, and ‘hints’ at another in an edit, although sadly you tend to find that in blog post edits, you can miss the key a little.
The way that Tim gets around this is to use Dynamic SQL, but I’m not a fan.
1 2 3 4 5 6 |
DECLARE @SQL NVARCHAR(MAX), @ProductID INT SET @ProductID = 870 SET @SQL = N'SELECT ProductID FROM Sales.SalesOrderDetail WHERE ProductID = ' + CAST(@ProductID AS VARCHAR(10)) EXECUTE sp_executesql @SQL |
Yeah – I’m not a fan of this. Go read my post on SQL Injection for why.
Tim does mention a better method in the ‘edit’ bit in his post, and I want to look at that. He says: “In many cases, a way to get around the local variable problem is to recompile the statement.” – but I want to make it very clear that the point is not actually to recompile the statement, but to use a statement that isn’t going to get put into the cache (which you do by using the OPTION (RECOMIPLE) hint, which makes it sound like you’re recompiling the statement).
When you use OPTION (RECOMPILE), the main impact is not that it recompiles, but that the query doesn’t get cached. Because it doesn’t get cached, it won’t find the query in the cache beforehand either. This means that it doesn’t need to consider the generalised version – it has the confidence to know that it doesn’t need to cater for future uses, so it can use the filtered index!
So the better option than using Dynamic SQL is to use OPTION (RECOMPILE).
So that’s one reason why your filtered index might not be used – but there’s another too:
Using the filtered index might be too much work.
Let’s consider Kendra’s example. She had a query that used IN. Her example was:
1 |
CREATE INDEX IX_Votes_filter ON dbo.Votes (PostId) WHERE (VoteTypeId IN (1,2)); |
But as I don’t have a Votes table, I’m going to use an equivalent with AdventureWorks:
1 |
CREATE INDEX FIDX_Product2Color ON Production.Product (ProductID) WHERE (Color IN ('Red', 'Black')); |
Kendra uses SELECT COUNT(*) queries to test it. Similarly to Kendra’s example, this works well when IN is used, and when OR is used, but when just one of the options is used, it needs a hint to use the index, and then it needs a lookup to satisfy it.
What Kendra doesn’t point out is why the filtered index needed the hint to be used in the single-option examples (which is fair enough – because her post was about OR v IN, not about why the filtered index wasn’t being used).
The reason why is because of the Lookup that’s needed. This is so expensive, it’s cheaper for the Query Optimiser to do a clustered index scan instead. It’s standard ‘tipping point’ stuff, but we don’t normally see this when we have COUNT(*), because COUNT(*) is just counting rows, not returning extra columns.
…except that there is a column that isn’t included in the filtered index, which our query needs – Color.
Subtly, even though our filtered index only contains rows that have the Color either Red or Black (or in Kendra’s example, VoteTypeID either 1 or 2), it doesn’t store which rows are Red and which rows are Black. We know that every row in the index is either Red or Black, so we can use this index as a starting point, but we would need to do a Lookup to get the actual Color value.
To fix this, we should INCLUDE the Color column in the filtered index.
1 |
CREATE INDEX FIDX_Product2Color2 ON Production.Product (ProductID) INCLUDE (Color) WHERE (Color IN ('Red', 'Black')); |
Now we don’t need to hint at the index to use, and we see a residual predicate being used to make sure that we only pull Red rows out of the index.
So we see two reasons here for filtered indexes not being used, and two ways to help encourage them to be used more often. The first is to consider using OPTION (RECOMPILE) to help persuade the Query Optimizer not to consider generalising the query across different parameter values, and the second is to INCLUDE the columns that are used in the filter, in case the database engine needs those values later in the query.
Do try to use filtered indexes, and be understanding about those times when they’re not used. There’s quite probably a simple explanation.
This Post Has 7 Comments
Well how can I register myselg on this site??
You can use parametrized dynamic SQL alternatively which is immune to injection.
Hi Marcel,
Yes (as per my post that I linked to above), but if you use parameters (with OPTION (RECOMPILE)) then the QO needs to come up with a plan that is general and is unlikely to use your filtered index.
Nice article, this is quite subtle but very useful knowledge that needs to be thought about.
If filtered indexes are susceptible to this problem then is it a good alternative idea to embed the filter clause in a view, index the view and then query on the view? Clearly the developer has to know how to make this work safely but this may well satisfy a lot of situations where the filtered index might be applied.
Oh Matt – I’m totally in favour of using views to help developers make good decisions about predicates they should use. 🙂
Pingback: Not Entirely Parameterized Dynamic SQL – Erik Darling Data
Pingback: Not Entirely Parameterized Dynamic SQL | Erik Darling Data