Ignoring best-practice

January 10, 2023

“If best-practice was always worth following, they’d take the option away.”

I don’t know who said this – certainly it’s something that I’ve said before. It’s not entirely true, but the gist of it is there. We see the impact of it in things like how the installer for SQL Server now defaults to multiple tempdb files. It used to be a single file, and “good advice” recommended increasing the number of files to roughly the number of processors.

There are things that Oracle DBAs can configure that SQL Server DBAs can’t. For example, SQL Server pages are 8kb in size. In Oracle they are 4kb on 32-bit systems and 16kb on IA64-bit systems, but can be configured to be up to a massive 256MB! That sounds like it could be useful, but I feel like I’d want to know the environment pretty well before I configured something like that. I’m sure it wouldn’t be right for a lot of my customers. As a consultant in the Microsoft space, I don’t really mind not being able to configure everything.

Raul Gonzalez (@SQLDoubleG) asks us about the times we choose to deliberately go against best-practice. And I want to try to avoid the glib response of “it’s best practice to weigh up the best-practice advice”. Instead, I want to recommend that you should try to understand the reasons behind the best-practice advice.

Probably the biggest examples I think of are around heaps, clustered index keys, and fragmentation.

Heaps

There’s a lot of excellent advice against heaps. And certainly heaps can be problematic. In a heap, there are no key columns to help locate a row – a row’s address is found by using the RowID value, made up of the file id, page id, and offset. Compared to an identity value as commonly used in a clustered index key, it’s definitely wide, and this gets used to identify the row in every non-clustered index. So there’s potential for these non-clustered indexes to take more space. Furthermore, when rows are deleted from a heap, the space they took doesn’t get cleaned up. And when rows are updated and don’t fit in the page they were in, a forwarding record is used so that the RowID doesn’t have to be updated in all the non-clustered indexes.

For these reasons, many developers avoid using heaps.

But if you know the RowID of the row, a lookup can be performed much quicker than a Key Lookup, which involves seeking through the data. A Key Lookup needs to do a binary search on the top level page of the clustered index, and then move to the next level down and do another binary search there, and keep doing this until it reaches the leaf level, where another binary search is needed. Only then can the SQL engine see the data that was needed from the lookup. A RID Lookup can go straight to the data, although it might find a forwarding record there and have to go to a different one. But both of these involve already knowing which file, page, and offset the data is in. It’s much faster.

Of course, ideally your non-clustered indexes include all the columns that are needed, so then lookups aren’t needed and the discussion about whether Key Lookups or RID Lookups are better becomes moot.

If the data is mostly insert-only, then the problems with deletes and updates are also moot.

And let me assure you, inserting into heaps is quicker than inserting into a clustered index. Even if your clustered index is on an increasing key, the location for that insert needs to be found each time, and there can be contention on the page being inserted into. Heaps don’t suffer from this, and inserting can end up being much quicker.

In general, I’m much less opposed to heaps than many people, and often find myself defending them to developers who want to put a clustered index on everything. That said, I’ve also found very large heaps with very little data, when developers haven’t understood how heaps work.

Clustered Index Keys

We’ve already mentioned how clustered indexes can help avoid some of the problems with heaps, but I’m rarely in favour of the standard clustered index key choice – that of an ever-increasing identity column. The arguments people make for this are typically around avoiding page-splits, and the smallness of the data type. But I see tables where the clustered index is on a unique ID column and nothing ever refers to it.

If you are never searching for data using the clustered index key (user seeks, not just lookups), and you’re never needing to order the data by that key, then there is no need to have the clustered key on it. There are bound to be columns that you do need to search on, in which case there will be non-clustered indexes on those columns. Non-clustered indexes that suffer from page splits during inserts. And if you are always going to need that data, then you’re including it in every non-clustered index anyway – so why not make that the clustered index? If it’s unique, you can make it the primary key. If it’s not unique, then have a non-unique clustered index instead. Or use that identity column as the second (or third or whatever) column in a composite primary key if you really feel like you need it.

Even if you need an identity column, it’s worth asking yourself whether that should really be the clustered index. By opting for something else, you might be able to avoid contention on inserts, and reduce the number of non-clustered indexes.

Fragmentation

There are two types of fragmentation. There’s the kind of fragmentation where the next page in the index is not on the same extent, and there’s the kind of fragmentation where pages are emptier than they could be. Typically, people report fragmentation using the former, but think they’re referring to the latter. And to “fix” fragmentation, people reorganise or rebuild their indexes.

But let’s think about this for a minute…

Ideally, the active part of your database is in RAM. That way, physical reads are fewer, and a logical reads are enough to get to the data you need. Reading more extents than you need is a physical read issue though – once your data is in RAM you can move from page to page on different extents just as easily as you can you move from page to page on the same extent. RAM doesn’t care, so why should you?

As for pages that have free space in them… well, what are you hoping to achieve by reducing that fragmentation? If you squeeze your data into fewer pages, then you’re going to need to split them sooner. Plus the work needed to rebuild indexes is a lot. What’s the impact on your log files? And therefore on your recovery times if you have a problem?

Overall

I’m not saying you should throw out all your clustered indexes, and adopt heaps that never get rebuilt. I’m saying that you should seek to understand the reasons why people make these recommendations. If you get back to the reasons why these recommendations turned up in the first place, you’ll see they were designed to avoid particular scenarios that were common. If you’re avoiding those scenarios in other ways, or you’re seeing different kinds of issues because of the practices you’re adopting, then look into what’s going on and don’t be afraid to adopt different practices to the standard advice that’s out there.

No one is going to criticise you for knowing the system better, although this doesn’t necessarily make arguments any easier to win. Sometimes adopting a standard ‘best-practice’ might be politically easier even if you can see issues, but if you can see why a particular best-practice isn’t best-for-you, then hopefully your deeper knowledge of the system might help you find a better way of doing it.

@rob_farley@twitter