The dangers of Dynamic SQL and how to avoid them

October 11, 2022

That’s “Dynamic SQL”, with a capital D, because I feel like it’s a thing in its own right and deserves to be in Title Case. Steve Jones (@way0utwest) has invited us to write about it. To explain the types of things we do with it, what we think of it, the problems – that’s the gist, and here’s my post…

The concept of Dynamic SQL is to create a SQL statement programmatically, and then execute that. As opposed to just writing and executing a query. My friend Erland Sommarskog is the master of Dynamic SQL and has written extensively about it over many years – the original version goes back to at least 2001 if not earlier, even though his current site says it’s a lot newer than that. Fun fact – ‘Sommarskog’ means ‘summer forest’ (in Swedish), but that’s not nearly the most interesting thing about Erland. If you ever get the chance to spend time with it, then do so. I recommend singing karaoke with him, particularly songs by David Bowie. If you really want to get deep into Dynamic SQL, read what Erland has to say about it.

One of the best things that he writes about is dynamic search conditions, and in particular, the concept of including or excluding blocks, rather than using OR clauses that cater for null-valued parameters. It’s excellent reading and I wish all database developers were across it.

And being the responsible expert that he is, Erland demonstrates heavily the importance of using parameters for user-entered information, rather than embedding that stuff into what gets executed. Because that’s how you avoid code that is vulnerable to SQL Injection.

I’ve written before about what I consider the golden rule with SQL Injection. And that is, to understand that DATA should not be executed like CODE. A parameter is fine – that’s not part of what gets executed – but you do not embed user-entered values into queries. You never create a WHERE clause like “WHERE t.col1 = 3”, if that value 3 is being passed in. Instead, you use “WHERE t.col1 = @param”. Because you can’t assume that “3” is safe to run.

But there are things in queries that can’t be parameterised. Table names, linked server names, column names, for example. Erland’s methods of either including or excluding a whole predicate is good. Really good. But I appreciate that developers don’t what to have to test for each possible table being passed in, just to avoid using the parameter in the query.

What I do is to not use the user’s parameter in the query directly, but to get it out of sys.objects.

You see, if someone passes in a parameter called ‘users’, then I can look that name up in sys.objects. Not just to confirm that it’s there, but to make sure I’m using it in a consistent way. But because the name in sys.objects might not match exactly (because of brackets, or maybe they’ve provided the schema as well), I don’t search on name. I use the object_id() function and use that value to look up what’s in sys.objects. Side note: The object_name() function is a useful alternative to actually querying sys.objects, but I prefer to use sys.objects so that I can keep my user-supplied parameter completely out of my SELECT clause.

The user might pass in ‘users’, but they might also pass in ‘[users]’, or ‘dbo.users’, or ‘[dbo].[users]’, or ‘dbo.[users]’ – you get the idea. But all of these values give the same result when passed into object_id(), which then works just fine to filter sys.objects.

So my code looks like this, and works regardless of whether or not the user has passed in a schema, or used brackets. And if someone passes in a value which isn’t recognised as a legitimate object name, it won’t create my statement, because the object_id() function will produce null and no row in sys.objects will match. Plus I can restrict it to tables (and not views or whatever else) by filtering on o.type.

And if I want to handle a column name being passed in too, then that works too, except that there’s no column_id() function, so I have to hope the user is consistent in the way they pass that in. But still, if they don’t pass in an actual column name, no query is produced.

I use quotename() everywhere because I want this to work even if someone has put spaces into a name. Or used a keyword. It’s just better that way.

If someone passes in a value for TOP, I can use that as a parameter. I don’t need that to be written in directly. Even a conversion style can be a parameter.

One of the few places that a number might actually be required would be in a type length – the ‘8’ in ‘char(8)’. But in that situation, I can force the parameter that’s being passed in to be an integer, and then use it without fear. Everything string value that could be passed into a query is either a keyword (of which there are a limited list), an operator (also a limited list), or I can find it in a Catalog View like sys.columns, sys.objects, sys.servers, etc.

This approach means I don’t have to worry about executing a value which wasn’t supposed to be executed, and my Dynamic SQL isn’t susceptible to SQL Injection attacks. It’s that golden rule I mentioned before.

Don’t just jump enthusiastically into Dynamic SQL because of what you’ve read today. Be cautious. It’s easy to make code that is overly complex, and which becomes vulnerable. But don’t get caught up counting apostrophes to try to make your Dynamic SQL safe – it’s much easier to simply avoid putting anything from a user into the executable code.

@rob_farley@twitter

This Post Has One Comment

  1. Gerard Jaryczewski

    It’s easy to make overly complex code – that’s right! If you are a beginner, it’s easy, because you don’t know how to code it easier, you are still learning. If you are an expert, it’s also easy, because you know all of these fancy operators, functions, and tricks. And it’s easy especially if you never meet someone like Rob Farley, who shows you how to do it less complex and more readable.

Leave a Reply

LobsterPot Blogs

Blog posts by Rob Farley and other LobsterPot Solutions team members.

Search

Related Blogs