T-SQL Tuesday – Query Cost

March 9, 2010

In SQL Server, the cost of a particular plans is based largely on I/O, which makes this post a good candidate for this month’s T-SQL Tuesday, hosted by Mike Walsh who, like me, walks a straight path.

In considering I/O – the movement of data generally In and Out of disk and memory – my thoughts come very much to query cost. If I run set statistics io on, I can see the reads and writes, both logical and physical, of the queries I run, and I can use this information to gauge the impact of my query on the system.

I can also see that the Estimated Subtree Cost as indicated in the Execution Plan seems to grow with the number of logical reads. This makes sense, as the number of logical reads required by a query can really make a query costly. A logical read implies that the read might only involve examining a page of data that is already in RAM, but it should always be remembered that if the required page of data is not in RAM already, it will need to be pulled in off disk first, being shown as a physical read.

The standard line with measuring cost in terms of I/O comes down to the impact of a physical read on the system. When a physical read occurs, this involves getting data off the actual disk(s), which I liken to the process of picking a track on an old vinyl record player (the records were vinyl, not the record players). Of course, the disk controller lines up the tracks far quicker than I ever could, but the principle still applies. Compared to shifting bits in RAM, it’s incredibly slow. Much less so with Solid State Disks, but that’s another matter entirely.

But the impact of I/O is certainly not the only factor on query cost. I remember some time ago having a discussion about string splitting with a friend of mine. I had seen some posts discussing the different ways of splitting strings, with the conclusion being that CLR functions are the best, as SQL Server generally isn’t very good at string manipulation, but it led to an interesting discussion.

Take one of the standard approaches to splitting strings, such as using a table of numbers. For the purposes of the experiment, I have a table called dbo.nums(num int primary key), populating to at least 50.

with        
table_of_strings (string) as        
(        
select 'This is one string'  union all        
select 'And here is another' union all        
select 'a third...'          union all        
select '...and a fourth'        
)        
select substring(s.string, n.num, charindex(' ',substring(s.string, n.num, len(s.string)) + ' ')) as word        
from         
table_of_strings s        
join         
dbo.nums n         
on  n.num <= len(s.string)        
and substring(' ' + s.string, n.num, 1) = ' '        
;

with

table_of_strings (string) as

(

select 'This is one string' union all

select 'And here is another' union all

select 'a third...' union all

select '...and a fourth'

)

select substring(s.string, n.num, charindex(' ',substring(s.string, n.num, len(s.string)) + ' ')) as word

from

table_of_strings s

join

dbo.nums n

on n.num <= len(s.string)

and substring(' ' + s.string, n.num, 1) = ' '

;

There are obviously other methods that can be used – this is just one. But now compare it to the following:

declare @qry nvarchar(max);     
with        
table_of_strings (string) as        
(        
select 'This is one string'  union all        
select 'And here is another' union all        
select 'a third...'          union all        
select '...and a fourth'        
)        
select @qry = stuff((select replace(' ' + replace(string,'''',''''''),' ',''' as word union all select ''')        
from table_of_strings        
for xml path('')), 1, 20, '') + ''' as word'        
;     
exec sp_executesql @qry;

declare @qry nvarchar(max);

with

table_of_strings (string) as

(

select 'This is one string' union all

select 'And here is another' union all

select 'a third...' union all

select '...and a fourth'

)

select @qry = stuff((select replace(' ' + replace(string,'''',''''''),' ',''' as word union all select ''')

from table_of_strings

for xml path('')), 1, 20, '') + ''' as word'

;

exec sp_executesql @qry;

Here I’m constructing a query using dynamic SQL to split strings, making a query which performs a UNION ALL of all my queries. And when I look at the cost of a batch that runs both these methods (by asking Management Studio to show me the Plans), I find that the second one is incredibly cheap. In fact, if I turn on statistics io, I can see that there is no I/O at all for the second method, whereas the first requires 12 logical reads – presumably to get at my table of numbers.

Clearly my dynamic SQL method of splitting strings is fantastically fast, and this is completely true. It is fantastic – a thing of fantasy. It’s actually really awful.

You see, as I mentioned earlier, SQL Server isn’t particularly good at string manipulation. So much so that creating this piece of dynamic SQL is very costly, won’t scale, and should never be implemented in a real environment. You can have a look at it to verify what I’m saying, but if you try to do this on a large set of data, you will very quickly find out just how poor it is.

To test it, created a table called table_of_strings (string varchar(1000)) and populated it with those four strings. I then doubled its size ten times using:

insert dbo.table_of_strings        
select * from dbo.table_of_strings         
go 10

insert dbo.table_of_strings

select * from dbo.table_of_strings

go 10

Using the table of numbers to split the strings involved over 12,000 logical reads, but the elapsed time showed 289 milliseconds to provide the 13312 rows of results.

The dynamic SQL method used a total of just 26 logical reads, but took 34 seconds to run on my (old) machine.

I’m not suggesting you ignore the I/O-based cost of execution plans, but you should most definitely consider more than just that – particularly as Solid State storage becomes more and more commonplace.

This Post Has 6 Comments

AaronBertrand March 9, 2010 Reply

Bummer! I was going to do a "Bad habits to kick : ignoring I/O" post for tomorrow, but my ideas were pretty much identical to yours. Kudos.
Rob Farley March 9, 2010 Reply

You should still write it though.
Armando Prato March 9, 2010 Reply

I’d give SET STATISTICS TIME ON a shout out too. I’ve found my own fair share of instances where a query that appears to run acceptably time-wise and I/O wise may be over utilizing CPU.
AaronBertrand March 9, 2010 Reply

Yup, you’re really right. I abandoned my original idea though, and published a more broad "bad habits" type of I/O post.
Mladen Prajdic March 10, 2010 Reply

I’d run profiler to get the correct reads for your second method.
i’ve seen STATS IO get totally screwed up when doing string manipulations.
i have a query where the STATS IO show 30k reads while profiler shows 500k reads.
Rob Farley March 10, 2010 Reply

Yes, I demonstrated that in my sqlbits.com talk, where a Compute Scalar had lots of hidden reads. stats io showed just 2 reads, when there were many in Profiler.
But the problem with this query is the intense CPU needed. There are some hidden I/O costs, but that’s a small impact compared to the CPU.