Ageing code

February 10, 2026

I’m pretty sure both spellings are fine, so don’t come at me for putting an ‘e’ in ‘ageing’. The interwebs tell me the version with the ‘e’ is preferred in England and Australia, so that’s what I’m going with.

Now, as much as I’d like to wax lyrical about how English ages and changes over time, and how people nowadays would’ve said “don’t @ me”, I’m going to fit into Pat’s brief slightly better and talk about code ageing – T-SQL code even!

I could write about how new features in T-SQL can help, such as the NULL-aware “IS [NOT] DISTINCT FROM”. If you have code that uses the non-sargable ISNULL function on a column, then that might be something to consider when you next review that code. But I’m going to talk about T-SQL that can age badly because of more sneaky reasons.

There’s a truth that I think many coders don’t really appreciate, and that’s that a T-SQL query is rarely good by itself. Sure, there are queries that have strong potential to be good, but performance depends on a few other things too. And this can mean that a query that was good when it was written might have become bad over time, even though it’s the same query it always was.

Bad plans in the cache

One obvious example is when a bad plan has crept into the cache, and now every time that query runs, the bad plan gets chosen, and performance is now rubbish.

Plans show us how the Query Optimizer has decided to run a query. And I’m saying ‘decided’ to personify it a little. Of course it’s just maths, following a precise algorithm with heuristics to try to make the best choices, but at the of the day, as much as we know that there’s no arbitrary decision-making going on, it can really feel like there’s some tiny person in there, choosing to punish us for every mistake.

One mistake that gives us a bad plan can be simply calling the query with a parameter that is rarely used but causes the query to run a very different way. That plan could get cached, and suddenly every subsequent call for that query (with more typical parameters) uses a less-than-ideal plan.

Imagine the query:

SELECT TOP (100) t.Col1
FROM SomeTable t
WHERE t.Col2 IS NOT DISTINCT FROM @p1
ORDER BY t.Col3 DESC;

There are two obvious ways that this query could run. One would be to apply a TopN Sort on the results of a Seek on Col2, the other would be to perform an ordered (backward) Scan of an index on Col3 with a residual Predicate on Col2. Which one is better? No idea. It depends on how selective the filter is. It also depends on whether particular indexes are available, although right now I’m going to assume there are covering indexes that could produce both plans, one on “(Col3) include (Col1, Col2)”, and one on “(Col2, Col3) include (Col1)”. And it depends on whether the SQL engine knows about the selectivity of different values (ie, the statistics). All this is where the strength of a query is not just about how the query is written, but about indexes, statistics, the cardinality estimate model, and more.

If the table has 100M rows and 10% of them have the value we’re looking for, then the perhaps the best option is the ordered Scan. On average we might need to look through 1000 rows to find the top 100, if those values are mostly around the largest values of Col3.

But what if only 0.001% of the rows have the value we’re looking for. A total of 1000 rows in the whole table. Now we’d be better off using a Seek to find the 1000 rows. But that Sort might need a large memory grant.

And what if there are 10M of them, but they’re all around the smallest values of Col3… oh, the joys of query tuning.

Maybe the value is usually selective, but every so often it’s not – maybe when it runs at 4am it uses a value which isn’t selective, so I’m willing to take the hit on performance, but during the day I want it to use the Seek + Sort plan. And usually this is fine, and there’s a cached plan that does the Seek + Sort that gets reused every time. Terrific.

Then one night, there’s an index rebuild, and the next time the query runs it’s the overnight non-selective parameter. The compiler comes up with a plan which is good for that value, and it’s the ordered Scan plan, which gets cached. Then when users arrive in the morning, that’s the plan that gets used every time, and it’s slow. Agonisingly, it gets often (but not always!) fixed by restarting the SQL service, and let’s stop imagining how much pain organisations can get into when they start down these rabbit holes.

Could we force the Seek? Sure. Should we force the Seek? That’s a tougher question. Maybe we need the overnight query to run better too. Maybe we need two queries. If the parameter value NULL is the only one which is non-selective, then making two queries is easy. We could test the value for NULL and then run a different query. That’d be easy to do. Or maybe it could be fixed by using the OPTION (RECOMPILE) hint, but that’s not exactly free either.

Inside knowledge is great, but perhaps we didn’t have that knowledge when we wrote the query, and the code has aged badly because we didn’t anticipate the future.

There are a bunch of other changes that can cause a perfectly adequate query to age poorly. Some quick examples:

More data than expected

It doesn’t have to be a ‘bad plan’ scenario for a query to be less good than when it was written, because conditions can simply change. The system was so popular than the amount of data grew and now some process takes a lot longer than before. The scan was fine, but now the table is a lot larger, and the values that we thought would be selective aren’t (or vice versa).

Or maybe because the volume of data has increased, the number of physical reads is higher. Perhaps adding RAM is a simple solution, but maybe queries that were fast enough with full scans need to be rewritten.

Better (!) estimates

Maybe the version has been upgraded, and the cardinality estimates are improved. The Optimizer used to think that 20 rows would be the results of a filter, but it was always wildly wrong, with 20,000 rows being more typical. Now with a later version of SQL, the estimate is correct, but because of this, there’s a join that’s using a Hash Match instead of a Nested Loop. Way fewer logical reads, but now it needs a larger memory grant to run the query, and system performance is hurting.

Concurrency

Speaking of memory, maybe the concurrency requirements have increased. Maybe there are 100x as many as expected users trying to use the system at once, and those memory grants are causing a bottleneck. Maybe queries need to be tuned to discourage large memory grants.

Index changes

Maybe partitions have been introduced, and the indexes haven’t been reviewed.

Or maybe a well-meaning DBA or external consultant (ahem!) has been brought in and a new index has been added to reduce the reads. Except that now some other query has started to use that index and although the Query Optimizer has good reason for choosing that index, it’s slower than it was before, or that query we’d tuned to reduce the memory has returned to its old patterns.

So… what should we do?

Naturally what we need is a crystal ball, and write our queries (and our indexes, partitions, statistics, hints, whatever) according to how we expect when things have changed. Typically if you write with scale in mind, you won’t be impacted too much by scale.

But also, understand that there will be some things that you can’t expect so easily. You can’t necessarily expect to anticipate that a particular index will be created, and I’m definitely not suggesting you force the use of a particular index from day 1 of your query, because the change might be about the shape of the data – something that was marvellously selective no longer is for example.

The general feel of T-SQL code might not age as obviously as, say, .Net application code, with techniques that have been superseded over the years. Or a UI that that has aged like a trendy haircut. A lot has changed in the database world, but a T-SQL query now looks roughly the same as it did in the days of SQL Server 2005. But you should totally expect that a query you wrote in a small dev environment might behave quite different in production as the size and shape of the data increases.

So don’t beat yourself up about it – just schedule a code review, or at least a good tuning session. There’s bound to be some things you can fix up, and maybe those fixes won’t age nearly so badly, because you have a better idea about how things will look in the future.

@robfarley.com@bluesky (previously @rob_farley@twitter)

Leave a Reply

LobsterPot Blogs

Blog posts by Rob Farley and other LobsterPot Solutions team members.

Search

Related Blogs

Archive