Waiting, waiting…

December 10, 2013

“It just runs slow these days”

I’m sure you’ve heard this, or even said it, about a computer that’s a few years old. We remember the days when the computer was new, and it seemed to just fly – but that was then, and this is now. Change happens, things erode, and become slower. Cars, people, computers. I can accept that cars get slower. They lose horsepower over time as the precision components wear and become less precise. I also know that my youth is a thing of the past. But electronics? What happens there?

Well, in my experience, computers don’t get slower. They just feel slower. I see two main reasons, and neither of them are because of ageing hardware.

Your computer might be slower than it was yesterday even. In the world of databases we might even be investigating why the computer is slower than it was five minutes ago. Again, it’s probably not because of ageing hardware.

One possible reason is that we’re simply asking systems to do more. If we’re comparing our laptops to when we bought them, we’re probably refreshing webpages more frequently (often in the background) and have installed too many utilities (hopefully not in the background, but you never know), and the system has more to get done in a given minutes compared to when it was new. With a database server, the amount of data has probably grown, there may be more VLFs in the log file to deal with, more users pushing more transactions. These are not things you want to uninstall like that annoying browser search bar on your aunt’s ageing computer, but they can be a very valid reason for things to be slower. Hopefully you are tuning your system to make sure that scalability is possible, and you’re very happy with the amount of extra work that’s being done, even if it does mean that some processes take a little longer than they once did.

This problem can be summarised by the fact that the system is having to wait for resources to be free so that it can get its stuff done.

Another reason for slowness is that the system is having to wait more for other reasons, things that you don’t want it having to wait for. An increase in busyness will cause slowness because of waiting, but you can easily make the argument that this is ‘acceptable’. It’s much more of a problem if the system is being slower without actually achieving any more than it was before.

Waits are the topic of this month’s T-SQL Tuesday, hosted by Robert Davis (@sqlsoldier). Go and have a look at his post to see what other people have written about on this topic.

In the SQL Server world, this kind of problem is identified by looking at wait stats. The system records what processes are waiting for, and you can see these by querying sys.dm_os_wait_stats. It’s very useful, but querying it in isolation isn’t as useful as taking snapshots of it. If you want to store copies of it over time, you may prefer to do something along the lines of:

--A schema for monitoring data can be useful       
create schema monitoring;    
--Create a table that has the structure of sys.dm_os_wait_stats       
select top (0) *        
into monitoring.waits        
from sys.dm_os_wait_stats;    
--Add a column to know the the stats are collected       
alter table monitoring.waits        
add snapshot_time datetime default sysdatetime();    
--Run this section regularly       
insert monitoring.waits (wait_type, waiting_tasks_count, wait_time_ms, max_wait_time_ms, signal_wait_time_ms)        
select * from sys.dm_os_wait_stats;

--A schema for monitoring data can be useful

create schema monitoring;

--Create a table that has the structure of sys.dm_os_wait_stats

select top (0) *

into monitoring.waits

from sys.dm_os_wait_stats;

--Add a column to know the the stats are collected

alter table monitoring.waits

add snapshot_time datetime default sysdatetime();

--Run this section regularly

insert monitoring.waits (wait_type, waiting_tasks_count, wait_time_ms, max_wait_time_ms, signal_wait_time_ms)

select * from sys.dm_os_wait_stats;

Regularly collecting snapshots of wait_stats like this can give you a picture of what has occurred over time. You can easily pull this data into a report, or into Excel, or even get a picture of a recent version quite easily, using a query such as:

with numbered as (       
select *,        
wait_time_ms - lead(wait_time_ms) over (partition by wait_type order by snapshot_time desc) as diff_wait_time,        
waiting_tasks_count - lead(waiting_tasks_count) over (partition by wait_type order by snapshot_time desc) as diff_wait_count,        
1000 * datediff(second,lead(snapshot_time) over (partition by wait_type order by snapshot_time desc),snapshot_time) as diff_ms,        
row_number() over (partition by wait_type order by snapshot_time desc) as rownum        
from monitoring.waits        
)        
select wait_type, snapshot_time, diff_wait_count, diff_wait_time, diff_ms        
from numbered        
where rownum = 1        
order by diff_wait_time desc, wait_type;

with numbered as (

select *,

wait_time_ms - lead(wait_time_ms) over (partition by wait_type order by snapshot_time desc) as diff_wait_time,

waiting_tasks_count - lead(waiting_tasks_count) over (partition by wait_type order by snapshot_time desc) as diff_wait_count,

1000 * datediff(second,lead(snapshot_time) over (partition by wait_type order by snapshot_time desc),snapshot_time) as diff_ms,

row_number() over (partition by wait_type order by snapshot_time desc) as rownum

from monitoring.waits

)

select wait_type, snapshot_time, diff_wait_count, diff_wait_time, diff_ms

from numbered

where rownum = 1

order by diff_wait_time desc, wait_type;

This query compares the amount of wait time for each type (which is frustratingly stored as a string) since the previous one, using the LEAD function that was introduced in SQL Server 2012 (LEAD rather than LAG because we’re looking at snapshot_time desc, not ASC). Using ROW_NUMBER(), we can easily pick out the latest snapshot by filtering to rownum = 1, but if you’re just wanting to chart them, the contents of the CTE will be enough.

Make sure you keep an eye on the amount of data you’re storing, of course, and be careful of the impact of someone inadvertently clearing the stats (though as the query picks up deltas, you should be able to consider a filter that will ignore the deltas that might have spanned a period during which the stats were cleared).

This post is not going to go into all the different wait types to tell you which ones are worth worrying about and which ones are worth ignoring. But what I would suggest to you is that you track what’s going on with your environment and keep an eye out for things that seem unusual. When troubleshooting, you will find any history invaluable.

This Post Has 3 Comments

Robert L Davis December 14, 2013 Reply

Thanks for participating in T-SQL Tuesday #49!
murali December 24, 2013 Reply

Hey Rob,
Thanks for the code, do you mind helping the same with SQL SERVER 2008 or 2008 R2. As We donot have LEAD or LAG functions.
Thanks in advance.
Regards
Murali
Rob Farley January 27, 2014 Reply

Sorry – forgot to get back to this after Christmas.
Use this code to avoid using 2012 features:
with numbered as (
select *, row_number() over (partition by wait_type order by snapshot_time desc) as rownum
from monitoring.waits
)
select w1.wait_type, w2.snapshot_time,
w1.waiting_tasks_count – w2.waiting_tasks_count as diff_wait_count,
w1.wait_time_ms – w2.wait_time_ms as diff_wait_time,
1000 * datediff(second, w2.snapshot_time, w1.snapshot_time) as diff_ms
from numbered w1
join numbered w2 on w2.wait_type = w1.wait_type
where w1.rownum = 1 and w2.rownum = 2
order by diff_wait_time desc, wait_type;

Waiting, waiting…

This Post Has 3 Comments

Leave a Reply Cancel reply

LobsterPot Blogs

Related Blogs

Detecting changes

Presentations with impact!

Career risks?

Ageing code

Archive

Ready to Feel Good About Your Data? Let’s Start the Conversation.

Email

Call

Sitemap

Services