Little Bobby Tables’ mother says you should always sanitise your data input. Except that I think she’s wrong. The SQL Injection aspect is for another post, where I’ll show you why I think SQL Injection is the same kind of attack as many other attacks, such as the old buffer overflow, but here I want to have a bit of a whinge about the way that some people sanitise data input, and even have a whinge about people who insist on using stored procedures for SSRS reports.
Let me say that again, in case you missed it the first time:
I want to have a whinge about people who insist on using stored procedures for SSRS reports.
Let’s look at the data input sanitisation aspect – except that I’m going to call it ‘parameter validation’. I’m talking about code that looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
create procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin /* First check that @eomdate is a valid date */ if isdate(@eomdate) != 1 begin select 'Please enter a valid date' as ErrorMessage; return; end /* Then check that time has passed since @eomdate */ if datediff(day,@eomdate,sysdatetime()) < 5 begin select 'Sorry - EOM is not complete yet' as ErrorMessage; return; end /* If those checks have succeeded, return the data */ select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales from Sales.SalesOrderHeader where OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID order by SalesPersonID; end |
Notice that the code checks that a date has been entered. Seriously??!! This must only be to check for NULL values being passed in, because anything else would have to be a valid datetime to avoid an error.
The other check is maybe fair enough, but I still don’t like it.
The two problems I have with this stored procedure are the result sets and the small fact that the stored procedure even exists in the first place. But let’s consider the first one of these problems for starters. I’ll get to the second one in a moment.
If you read Jes Borland (@grrl_geek)’s recent post about returning multiple result sets in Reporting Services, you’ll be aware that Reporting Services doesn’t support multiple results sets from a single query. And when it says ‘single query’, it includes ‘stored procedure call’. It’ll only handle the first result set that comes back. But that’s okay – we have RETURN statements, so our stored procedure will only ever return a single result set. Sometimes that result set might contain a single field called ErrorMessage, but it’s still only one result set.
Except that it’s not okay, because Reporting Services needs to know what fields to expect. Your report needs to hook into your fields, so SSRS needs to have a way to get that information. For stored procs, it uses an option called FMTONLY.
When Reporting Services tries to figure out what fields are going to be returned by a query (or stored procedure call), it doesn’t want to have to run the whole thing. That could take ages. (Maybe it’s seen some of the stored procedures I’ve had to deal with over the years!)
So it turns on FMTONLY before it makes the call (and turns it off again afterwards). FMTONLY is designed to be able to figure out the shape of the output, without actually running the contents. It’s very useful, you might think.
1 2 3 |
set fmtonly on exec dbo.GetMonthSummaryPerSalesPerson '20030401'; set fmtonly off |
Without the FMTONLY lines, this stored procedure returns a result set that has three columns and fourteen rows. But with FMTONLY turned on, those rows don’t come back.
But what I do get back hurts Reporting Services.
It doesn’t run the stored procedure at all. It just looks for anything that could be returned and pushes out a result set in that shape. Despite the fact that I’ve made sure that the logic will only ever return a single result set, the FMTONLY option kills me by returning three of them.
It would have been much better to push these checks down into the query itself.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID order by SalesPersonID; end |
Now if we run it with FMTONLY turned on, we get the single result set back. But let’s consider the execution plan when we pass in an invalid date.
First let’s look at one that returns data. I’ve got a semi-useful index in place on OrderDate, which includes the SalesPersonID and TotalDue fields. It does the job, despite a hefty Sort operation.
…compared to one that uses a future date:
You might notice that the estimated costs are similar – the Index Seek is still 28%, the Sort is still 71%. But the size of that arrow coming out of the Index Seek is a whole bunch smaller.
The coolest thing here is what’s going on with that Index Seek. Let’s look at some of the properties of it.
Glance down it with me… Estimated CPU cost of 0.0005728, 387 estimated rows, estimated subtree cost of 0.0044385, ForceSeek false, Number of Executions 0.
That’s right – it doesn’t run. So much for reading plans right-to-left…
The key is the Filter on the left of it. It has a Startup Expression Predicate in it, which means that it doesn’t call anything further down the plan (to the right) if the predicate evaluates to false.
Using this method, we can make sure that our stored procedure contains a single query, and therefore avoid any problems with multiple result sets. If we wanted, we could always use UNION ALL to make sure that we can return an appropriate error message.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, /*Placeholder: */ '' as ErrorMessage from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID /* Now include the error messages */ union all select 0, 0, 0, 'Please enter a valid date' as ErrorMessage where isdate(@eomdate) != 1 union all select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage where datediff(day,@eomdate,sysdatetime()) < 5 order by SalesPersonID; end |
But still I don’t like it, because it’s now a stored procedure with a single query. And I don’t like stored procedures that should be functions.
That’s right – I think this should be a function, and SSRS should call the function. And I apologise to those of you who are now planning a bonfire for me. Guy Fawkes’ night has already passed this year, so I think you miss out. (And I’m not going to remind you about when the PASS Summit is in 2012.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
create function dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) returns table as return ( select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, '' as ErrorMessage from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID union all select 0, 0, 0, 'Please enter a valid date' as ErrorMessage where isdate(@eomdate) != 1 union all select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage where datediff(day,@eomdate,sysdatetime()) < 5 ); |
We’ve had to lose the ORDER BY – but that’s fine, as that’s a client thing anyway. We can have our reports leverage this stored query still, but we’re recognising that it’s a query, not a procedure. A procedure is designed to DO stuff, not just return data. We even get entries in sys.columns that confirm what the shape of the result set actually is, which makes sense, because a table-valued function is the right mechanism to return data.
And we get so much more flexibility with this.
If you haven’t seen the simplification stuff that I’ve preached on before, jump over to http://bit.ly/SimpleRob and watch the video of when I broke a microphone and nearly fell off the stage in Wales. You’ll see the impact of being able to have a simplifiable query. You can also read the procedural functions post I wrote recently, if you didn’t follow the link from a few paragraphs ago.
So if we want the list of SalesPeople that made any kind of sales in a given month, we can do something like:
1 2 3 |
select SalesPersonID from dbo.GetMonthSummaryPerSalesPerson(@eomonth) order by SalesPersonID; |
This doesn’t need to look up the TotalDue field, which makes a simpler plan.
1 2 3 4 |
select * from dbo.GetMonthSummaryPerSalesPerson(@eomonth) where SalesPersonID is not null order by SalesPersonID; |
This one can avoid having to do the work on the rows that don’t have a SalesPersonID value, pushing the predicate into the Index Seek rather than filtering the results that come back to the report.
If we had joins involved, we might see some of those being simplified out. We also get the ability to include query hints in individual reports. We shift from having a single-use stored procedure to having a reusable stored query – and isn’t that one of the main points of modularisation?
Stored procedures in Reporting Services are just a bit limited for my liking. They’re useful in plenty of ways, but if you insist on using stored procedures all the time rather that queries that use functions – that’s rubbish.
This Post Has 10 Comments
It’s nice to see a discussion over at http://sqlblog.com/blogs/merrill_aldrich/archive/2011/11/07/t-sql-tuesday-24-ode-to-composable-code.aspx, where Jamie & Merrill discuss the value of TVFs too.
Good one. I’ll try it next time.
Typically, I wouldn’t return errors in the same sproc which returns the data set, but it is a fair point. What gets me thinking is how we can stop building stuff like usp_rs_<reportname> for each report and substitute with something more reusable. Especially considering that in a typical data mart we end up writing similar queries for similar reports.
Would’ve loved to see a little argument mentioning how bad it is to write _all_ of the code in SSRS, though…I still have to convince all sorts of devs/dbas that writing all the code in SSRS is rubbish too…
Hi Boyan,
Yes, I wouldn’t be recommending joining functions to other functions.
Using TVFs instead of procedures can be really neat. Extra columns can be added without breaking reports, and using the principles shown in http://bit.ly/SimpleRob, you can make joins redundant and far better performance than less-than-ideal procedures.
Rob
I do have some good news though, they’re considering Multiple sets returned for 2008.
Oh.
http://connect.microsoft.com/SQLServer/feedback/details/308493/allow-multiple-datasets-in-one-reporting-service-sql-query
(A connect request I put in in 2007 asking for it to be considered ;))
Hi Rob,
What I like to see is that a database object is called from RS. I don’t care what type of object it is. I prefer sprocs though. My main interest is that I don’t want embedded T-SQL in reports. Every time code gets embedded somewhere outside the database, another minefield is left for anyone that ever needs to refactor or work on the database, as they have no visibility on the code being executed.
I do often get frustrated with some of the limitations of the SQL Server datasources in RS and I often wish I could find the query option that says "just run with me on this" when it wants to have a parsing whinge. Most of those issues I can get around by using an OLEDB datasource instead of a SQL Server one.
The valid point has been made at the end of the article. Inline UDF should be the first choice whenever possible including reports. But saying always use UDF is wrong as well. There is no tool that is the best choice in every situation. So "Everything in…" is really a bad way to begin the sentence. Stored procedures have their place but should not be used always, nor should UDFs. I’ve seen many complex reports where sprocs were the only choice.
It is a shame that the author didn’t use raiserror rather than the select statement in his example for handling error conditions. After all raiserror is meant to be used for well, raising errors. The point would be equally valid.
You are suggesting a reasonable workaround, of course, but would it not be better to fix the root cause? FMT_ONLY should be deprecated, as it leads to issues like the one you described. Instead of using it, SSRS should allow us to execute whatever we want and let us specify the parameters we want to call it with.
I ran into a similar issue when I was dragging a stored procedure onto Linq pad. The procedure was storing intermediate results in a temp table, and FMT_ONLY choked on it. So I wrote my own little tool to replace the one that comes with VS 2010 – it just works, and it took me like half an hour to develop. Instead of using the unreliable FMT_ONLY, I just use a string to get the parameters to invoke the module with – it is that easy to get it right.
I would disagree with you Rob, only because I don’t think that a function should be used (conceptually) to return a resultset, it should be mapping input to output. While that *may* be the case, it also might not be for these reporting situations, where you might have no input at all, you might have input of a date range and return a huge dataset, etc. I do see your point about the multiple results returned from FMTONLY, and the validation in the WHERE is a definite improvement.
FMT_ONLY is the problem, and SSRS should either get rid of it or make it optional. However one workaround that I find prettier is to put a dummy select at the beginning of the proc. IF 1=2 SELECT Null as [Column1], Null as [Column2]…etc. It will never get called but trick FMT_ONLY into thinking it knows the return values.
In similar vein, SSIS likes to use SET Rowcount 1 to get its metadata… or poison the plan cache, depending on how it feels. http://piers7.blogspot.com.au/2009/06/nasty-ssis-2008-issue-with-table-or.html.