I previously posted about a problem where it seemed that changing the case of the word "BY" in a SELECT query was causing it to run much much faster. Now I've hit the same thing again, where basically almost any change I make to how the query is executed (so that it still performs the same function) causes the performance to jump from a dismal 7 or 8 seconds to instantaneous. It's a very simple query of the form: SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0 which was running fine until a moment ago, when it suddently started running hopelessly slowly. If change anything in the query to lowercase (or the Min to uppercase), it runs fine again. Last time someone suggested something about a bad plan being cached, and after a bit of research I found the commands DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE. Sure enough, after running these, the query started running fine again. The question is a) why is this happening? Is it a bug in my code, or in SQL server? b) is it worth detecting it and fixing it automatically? I.e, should I put some code in that notices that a query is running far too slowly, then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that cause other problems? Thanks
[quoted text, click to view] Thanks for that...amazingly enough it turned that that was exactly my problem, although I'm using ad-hoc queries rather than stored procs. I did some more testing, and it turned out that it was because it was executing the same query twice, the first time with an atypical parameter value, and the second time with a more typical one, that the query was running so slowly. That is, executing SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 999 followed by SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 123 Caused the second query to run absurdly slowly, because in the first case only very few rows in the table had MyKey = 999 whereas almost every row had MyKey = 123. After doing a DBCC FREEPROCCACHE and swapping the queries around, they both ran fine. In the end I ended up de-parameterizing the query just for this case, but now I'm worried - how can I be sure that my other queries won't suffer from the same problem? Should I never use parameters because of this possibility?
[quoted text, click to view] Uri Dimant wrote: > > In the end I ended up de-parameterizing the query just for this case, > > but now I'm worried - how can I be sure that my other queries won't > > suffer from the same problem? Should I never use parameters because of > > this possibility? > > > An ability using parameters is very powerful , don't afraid using parameters > , just test it carefuly >
Sure, except that the content of the database is out of my control - this particular scenario (where nearly all the records matched a particular key, but the query was first run against a different key) could easily arrise in a production environment. More to the point, I've seen no evidence that I'm getting any performance benefits from using parameterized queries. I suspect I will at least add a configuration option to avoid parameterized queries (relatively straightforward, as I have a layer of code that handles query parameters) if I see a problem like this again.
[quoted text, click to view] wizofaus@hotmail.com wrote: > Uri Dimant wrote: >>> In the end I ended up de-parameterizing the query just for this >>> case,
"de-parameterizing"? You mean changing to dynamic sql and leaving yourself vulnerable to sql injection?? [quoted text, click to view] >>> but now I'm worried - how can I be sure that my other queries won't >>> suffer from the same problem? Should I never use parameters >>> because of this possibility? >>> >> An ability using parameters is very powerful , don't afraid using >> parameters , just test it carefuly >> > Sure, except that the content of the database is out of my control - > this particular scenario (where nearly all the records matched a > particular key, but the query was first run against a different key) > could easily arrise in a production environment. More to the point, > I've seen no evidence that I'm getting any performance benefits from > using parameterized queries. > I suspect I will at least add a configuration option to avoid > parameterized queries (relatively straightforward, as I have a layer > of > code that handles query parameters) if I see a problem like this > again.
This is a ridiculous overreaction. Problems due to parameter-sniffing are too rare to justify eliminating the benefits of using parameters. Talk about "throwing the baby out with te bath water". The article showed two, no three, ways to alleviate the problems caused by parameter sniffing and still use parameters. So what do you do? ignore the article's advice and "de-parameterize" your query... -- Microsoft MVP - ASP/ASP.NET Please reply to the newsgroup. This email account is my spam trap so I don't check it very often. If you must reply off-line, then remove the "NO SPAM"
Hi http://blogs.msdn.com/khen1234/archive/2005/06/02/424228.aspx [quoted text, click to view] <wizofaus@hotmail.com> wrote in message news:1168494297.719888.324130@77g2000hsv.googlegroups.com... >I previously posted about a problem where it seemed that changing the > case of the word "BY" in a SELECT query was causing it to run much much > faster. > > Now I've hit the same thing again, where basically almost any change I > make to how the query is executed (so that it still performs the same > function) causes the performance to jump from a dismal 7 or 8 seconds > to instantaneous. It's a very simple query of the form: > > SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0 > > which was running fine until a moment ago, when it suddently started > running hopelessly slowly. If change anything in the query to > lowercase (or the Min to uppercase), it runs fine again. Last time > someone suggested something about a bad plan being cached, and after a > bit of research I found the commands DBCC DROPCLEANBUFFERS and DBCC > FREEPROCCACHE. Sure enough, after running these, the query started > running fine again. The question is > > a) why is this happening? Is it a bug in my code, or in SQL server? > b) is it worth detecting it and fixing it automatically? I.e, should I > put some code in that notices that a query is running far too slowly, > then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that > cause other problems? > > Thanks >
be EXTREMELY wary when using DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE on a production machine. They clear all cached SPs, queries and plans. The instance is bound to run under extreme stress for a considerable amount of time. [quoted text, click to view] wizof...@hotmail.com wrote: > I previously posted about a problem where it seemed that changing the > case of the word "BY" in a SELECT query was causing it to run much much > faster. > > Now I've hit the same thing again, where basically almost any change I > make to how the query is executed (so that it still performs the same > function) causes the performance to jump from a dismal 7 or 8 seconds > to instantaneous. It's a very simple query of the form: > > SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0 > > which was running fine until a moment ago, when it suddently started > running hopelessly slowly. If change anything in the query to > lowercase (or the Min to uppercase), it runs fine again. Last time > someone suggested something about a bad plan being cached, and after a > bit of research I found the commands DBCC DROPCLEANBUFFERS and DBCC > FREEPROCCACHE. Sure enough, after running these, the query started > running fine again. The question is > > a) why is this happening? Is it a bug in my code, or in SQL server? > b) is it worth detecting it and fixing it automatically? I.e, should I > put some code in that notices that a query is running far too slowly, > then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that > cause other problems? > > Thanks
[quoted text, click to view] > In the end I ended up de-parameterizing the query just for this case, > but now I'm worried - how can I be sure that my other queries won't > suffer from the same problem? Should I never use parameters because of > this possibility? >
An ability using parameters is very powerful , don't afraid using parameters , just test it carefuly [quoted text, click to view] <wizofaus@hotmail.com> wrote in message news:1168509843.760497.149900@i56g2000hsf.googlegroups.com... > Uri Dimant wrote: >> Hi >> http://blogs.msdn.com/khen1234/archive/2005/06/02/424228.aspx >> > Thanks for that...amazingly enough it turned that that was exactly my > problem, although I'm using ad-hoc queries rather than stored procs. I > did some more testing, and it turned out that it was because it was > executing the same query twice, the first time with an atypical > parameter value, and the second time with a more typical one, that the > query was running so slowly. That is, executing > > SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 999 > followed by > SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 123 > > Caused the second query to run absurdly slowly, because in the first > case only very few rows in the table had MyKey = 999 whereas almost > every row had MyKey = 123. After doing a DBCC FREEPROCCACHE and > swapping the queries around, they both ran fine. > > In the end I ended up de-parameterizing the query just for this case, > but now I'm worried - how can I be sure that my other queries won't > suffer from the same problem? Should I never use parameters because of > this possibility? >
[quoted text, click to view] swaroop.a...@gmail.com wrote: > be EXTREMELY wary when using DBCC DROPCLEANBUFFERS and DBCC > FREEPROCCACHE on a production machine. They clear all cached SPs, > queries and plans. The instance is bound to run under extreme stress > for a considerable amount of time. >
The instance? You mean it affects all databases? In this case, I determined I'd have to do it before every single query call, so obviously that's not practical.
[quoted text, click to view] > a) why is this happening? Is it a bug in my code, or in SQL server? > b) is it worth detecting it and fixing it automatically? I.e, should I > put some code in that notices that a query is running far too slowly, > then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that > cause other problems?
Pardon me but Why did you not say which version of SQL Server you are running on? And why not mention the version of MDAC you have installed (since you posted to microsoft.public.data.ado)? And why not mention something about the Primary Keys & Indexes of the table you are querying - I take it you have an index on MyKey? And have you checked the documentation, whitepapers, MSDN for details on query performance? And have you checked the query plan in Query Analyser? Cheers Stephen Howe
[quoted text, click to view] Erland Sommarskog wrote: > When you set a database to forced parameterisation, SQL Server will > auto-parameterise all statements (with some exceptions documented in > Books Online); normally it only auto-parameterise very simple queries. > In the case of Mgmt Studio it's reallly a go-faster switch. > > So dismissing caching of parameterised queries can be a serious mistake. > But it is certainly true that there are situations where parameter > sniffing can be a problem. If it is possible for you to tell in the > appliocation "this is an odd value that needs a special plan", then > you can modify the query text by adding a redudant condition like > "AND 1 = 1". Actually as you have found, changing "BY" to "by" or even > adding extra spaces help. This is because the lookup in the cache > is done on a hash without first collapsing spaces or parsing the > query text. >
Thanks...one of the most helpful replies I've had on usenet for some time now! The problem is that it's pretty hard for me to know that a value is "odd". In this case, like I said the query in this case is SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0 Where @0 is actually a value from a list that is obtained from elsewhere (not the database). It loops through this list, calling the same query for each one. Now, I suppose I could first do SELECT Count(*) FROM MyTable WHERE MyKey = @0 and determine if the number was very low, and if so, de-parameterize it or add a space or whatever, but then this second query would potentially suffer from the same problem. I suppose another alternative is to build another query first SELECT MyKey, Min(MyValue) FROM MyTable GROUP BY MyKey then use this to obtain the minimum value for each key, but there's only so much time I can spend rewriting queries to side-step quirky performance issues (the application has many many ad-hoc queries, nearly all parameterized on the same key, so they are all prone to the same problem). BTW, this is under SQL server 2000. I've yet to determine if this particular problem does actually exist under 2005, which is what we're using for the production environment. Will definitely do that before I waste too much more time on it.
(wizofaus@hotmail.com) writes: [quoted text, click to view] > Sure, except that the content of the database is out of my control - > this particular scenario (where nearly all the records matched a > particular key, but the query was first run against a different key) > could easily arrise in a production environment. More to the point, > I've seen no evidence that I'm getting any performance benefits from > using parameterized queries. > I suspect I will at least add a configuration option to avoid > parameterized queries (relatively straightforward, as I have a layer of > code that handles query parameters) if I see a problem like this again.
Which version of SQL Server are you using? Here is a test that you can try to see that you can do to actually test the benefit of parameterised queries. First create this database: CREATE DATABASE many_sps go USE many_sps go DECLARE @sql nvarchar(4000), @x int SELECT @x = 1000 WHILE @x > 0 BEGIN SELECT @sql = 'CREATE PROCEDURE abc_' + ltrim(str(@x)) + '_sp @orderid int AS SELECT O.OrderID, O.OrderDate, O.CustomerID, C.CompanyName, Prodcnt = OD.cnt, Totalsum = OD.total FROM Northwind..Orders O JOIN Northwind..Customers C ON O.CustomerID = C.CustomerID JOIN (SELECT OrderID, cnt = COUNT(*), total = SUM(Quantity * UnitPrice) FROM Northwind..[Order Details] GROUP BY OrderID) AS OD ON OD.OrderID = O.OrderID WHERE O.OrderID = @orderid' EXEC(@sql) SELECT @x = @x - 1 END (Don't worry if you don't have Northwind on your server, you are not going to run these procedures.) Then use F7 to get the Summary page, and navigate to the Stored Procedures node for many_sps. Select some 200 procedures, right-click and select Script As Create To New Query Window. Go for a cup of coffee - this will take some time depending on your hardware. When the operation has completed (or you have gotten tired of waiting and killed SSMS), issue this command: ALTER DATABASE db SET PARAMETERIZATION FORCED Redo the scripting operation. It will now complete in five seconds. The reason for this is that SQL Server Management Studio does not use parameterised queries. For every procedure it scripts, Mgmt Studio issues around five queries. All these queries makes it to the cache that explodes, and all these queries are compiled. When you set a database to forced parameterisation, SQL Server will auto-parameterise all statements (with some exceptions documented in Books Online); normally it only auto-parameterise very simple queries. In the case of Mgmt Studio it's reallly a go-faster switch. So dismissing caching of parameterised queries can be a serious mistake. But it is certainly true that there are situations where parameter sniffing can be a problem. If it is possible for you to tell in the appliocation "this is an odd value that needs a special plan", then you can modify the query text by adding a redudant condition like "AND 1 = 1". Actually as you have found, changing "BY" to "by" or even adding extra spaces help. This is because the lookup in the cache is done on a hash without first collapsing spaces or parsing the query text. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at
Erland Sommarskog (esquel@sommarskog.se) writes: [quoted text, click to view] > When the operation has completed (or you have gotten tired of waiting > and killed SSMS), issue this command: > > ALTER DATABASE db SET PARAMETERIZATION FORCED > > Redo the scripting operation. It will now complete in five seconds.
By the way, this is something important for your application as well. Say that a DBA finds out that your app is thrashing the cache by not using parameterised queries, and sets the database to forced parameterisation, you will get back the behaviour you have now. A bettery remedy is to add OPTION (RECOMPILE) at the end of sensitive queries. This forces a statement recompile, and the query will not be put in cache. This means that you can still use parameterised queries and get the other benefits of it. (Protection for SQL injection and repsecting the user's regional settings.) You also avoid thrashing the cache. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at
(wizofaus@hotmail.com) writes: [quoted text, click to view] > Thanks...one of the most helpful replies I've had on usenet for some > time now! > > The problem is that it's pretty hard for me to know that a value is > "odd".
I can understand that this is not always simple. I didn't say this, in hope it would be. :-) However, I think I have a cure for you: [quoted text, click to view] > In this case, like I said the query in this case is > > SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0 > > Where @0 is actually a value from a list that is obtained from > elsewhere (not the database). It loops through this list, calling the > same query for each one.
Stop! Don't do that! The problems with query plans aside, this is an ineffecient use of SQL Server. Get all data at once with: SELECT t.MyKey, Min(t.MyValue) FROM MyTable t JOIN list_to_table(@list) f ON t.MyKey = f.value GROUP BY t.MyKey Where list_to_table is a table-valued function that transform the list to a table. I have a whole bunch of such functions on http://www.sommarskog.se/arrays-in-sql.html. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at
Stephen, You forget in the list, the version of the framework that is used. It is to AdoNet as well you know. :-) Cor "Stephen Howe" <stephenPOINThoweATtns-globalPOINTcom> schreef in bericht news:%23kg1j%23YNHHA.992@TK2MSFTNGP04.phx.gbl... [quoted text, click to view] >> a) why is this happening? Is it a bug in my code, or in SQL server? >> b) is it worth detecting it and fixing it automatically? I.e, should I >> put some code in that notices that a query is running far too slowly, >> then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that >> cause other problems? > > Pardon me but > > Why did you not say which version of SQL Server you are running on? > And why not mention the version of MDAC you have installed (since you > posted > to microsoft.public.data.ado)? > And why not mention something about the Primary Keys & Indexes of the > table > you are querying - I take it you have an index on MyKey? > And have you checked the documentation, whitepapers, MSDN for details on > query performance? > And have you checked the query plan in Query Analyser? > > Cheers > > Stephen Howe > >
[quoted text, click to view] Erland Sommarskog wrote: > (wizofaus@hotmail.com) writes: > > Thanks...one of the most helpful replies I've had on usenet for some > > time now! > > > > The problem is that it's pretty hard for me to know that a value is > > "odd". > > I can understand that this is not always simple. I didn't say this, in > hope it would be. :-) > > However, I think I have a cure for you: > > > In this case, like I said the query in this case is > > > > SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0 > > > > Where @0 is actually a value from a list that is obtained from > > elsewhere (not the database). It loops through this list, calling the > > same query for each one. > > Stop! Don't do that! The problems with query plans aside, this is an > ineffecient use of SQL Server. Get all data at once with: > > SELECT t.MyKey, Min(t.MyValue) > FROM MyTable t > JOIN list_to_table(@list) f ON t.MyKey = f.value > GROUP BY t.MyKey > > Where list_to_table is a table-valued function that transform the list > to a table. I have a whole bunch of such functions on > http://www.sommarskog.se/arrays-in-sql.html. > I agree this sort of thing would be preferable, and more efficient, but as it is, providing I don't get the bad plan problem, it's efficient enough as it is, and I'm wary of making too many big changes at this point. That sort of thing I'd prefer to leave for the next version. The thing is, I just tried exporting all the records to a CSV file, creating a clean database with the same tables/indices, re-importing all the records, and the same problem DOESN'T happen: I can run that query with the "odd" value first, then with the common value, and both queries run fast. So something else has happened to my database that surely I must be able to reset somehow?
[quoted text, click to view] Erland Sommarskog wrote: > (wizofaus@hotmail.com) writes: > > I agree this sort of thing would be preferable, and more efficient, but > > as it is, providing I don't get the bad plan problem, it's efficient > > enough as it is, and I'm wary of making too many big changes at this > > point. That sort of thing I'd prefer to leave for the next version. > > But keep in mind that the solution you have now will not scale well. If > the data in production is ten times larger than you have expected, you > will get ten times more execution time, even with the good plan.
Sure. I definitely plan on doing some query optimization and consolidation for the next version. Your routines may well come in handy, so thanks. [quoted text, click to view] > > > The thing is, I just tried exporting all the records to a CSV file, > > creating a clean database with the same tables/indices, re-importing > > all the records, and the same problem DOESN'T happen: I can run that > > query with the "odd" value first, then with the common value, and both > > queries run fast. So something else has happened to my database that > > surely I must be able to reset somehow? > > I eavesdropped a discussion at PASS in Seattle last year, when a guy > had done extensive tests, and he could repeat a scenario that depending > on which order he loaded the same data, he would get different plans, > good or bad. I presume that part of the answer lies what exactly is in > the statistics. Normally, statistics are only samples, and if the > statistics does not well reflect the data distribution, your plans > will not always be the best. >
Well I found another solution - reindex the table. I ran dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the problem is gone away. My main concern was that if we did see this problem occuring in productoion databases, how could we fix it, other than changing the code, and at least now I have such a solution, and it's a bit less drastic than exporting and re-importing all the data (which potentially could have taken hours). I'm not sure whether 90 is the best parameter value here, that was just from the MSDN article.
(wizofaus@hotmail.com) writes: [quoted text, click to view] > I agree this sort of thing would be preferable, and more efficient, but > as it is, providing I don't get the bad plan problem, it's efficient > enough as it is, and I'm wary of making too many big changes at this > point. That sort of thing I'd prefer to leave for the next version.
But keep in mind that the solution you have now will not scale well. If the data in production is ten times larger than you have expected, you will get ten times more execution time, even with the good plan. [quoted text, click to view] > The thing is, I just tried exporting all the records to a CSV file, > creating a clean database with the same tables/indices, re-importing > all the records, and the same problem DOESN'T happen: I can run that > query with the "odd" value first, then with the common value, and both > queries run fast. So something else has happened to my database that > surely I must be able to reset somehow?
I eavesdropped a discussion at PASS in Seattle last year, when a guy had done extensive tests, and he could repeat a scenario that depending on which order he loaded the same data, he would get different plans, good or bad. I presume that part of the answer lies what exactly is in the statistics. Normally, statistics are only samples, and if the statistics does not well reflect the data distribution, your plans will not always be the best. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at
(wizofaus@hotmail.com) writes: [quoted text, click to view] > Well I found another solution - reindex the table. > > I ran > > dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the > problem is gone away. My main concern was that if we did see this > problem occuring in productoion databases, how could we fix it, other > than changing the code, and at least now I have such a solution, and > it's a bit less drastic than exporting and re-importing all the data > (which potentially could have taken hours).
It's recommended to run a maintenance job to reindex the table with some frequency. The main reason for this is to prevent defragmentation. A side effect of this is that statistics are updated with fullscan, that is all rows are considered. That gives the optimizer more accurate information. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at
[quoted text, click to view] On 13 Jan 2007 18:15:11 -0800, wizofaus@hotmail.com wrote: >Well I found another solution - reindex the table. > >I ran > >dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the >problem is gone away. My main concern was that if we did see this >problem occuring in productoion databases, how could we fix it, other >than changing the code, and at least now I have such a solution, and >it's a bit less drastic than exporting and re-importing all the data >(which potentially could have taken hours).
How large is your table? Maybe a simple "update statistics" would also fix things? J.
[quoted text, click to view] JXStern wrote: > On 13 Jan 2007 18:15:11 -0800, wizofaus@hotmail.com wrote: > > >Well I found another solution - reindex the table. > > > >I ran > > > >dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the > >problem is gone away. My main concern was that if we did see this > >problem occuring in productoion databases, how could we fix it, other > >than changing the code, and at least now I have such a solution, and > >it's a bit less drastic than exporting and re-importing all the data > >(which potentially could have taken hours). > > How large is your table?
'bout 2 million records, 9 columns. [quoted text, click to view] > > Maybe a simple "update statistics" would also fix things? >
Quite possibly - unfortunately I can't re-create the problem now to test it! Will be the first thing I try if I see the same problem again, though.
[quoted text, click to view] On 15 Jan 2007 13:36:21 -0800, wizofaus@hotmail.com wrote: >> How large is your table? > >'bout 2 million records, 9 columns. >> >> Maybe a simple "update statistics" would also fix things? >> >Quite possibly - unfortunately I can't re-create the problem now to >test it! >Will be the first thing I try if I see the same problem again, though.
Could probably reproduce it - insert a 1,000,000 rows = 1, then a handfull of rows numbered 2-10. Query it for =1, it will scan, then query it for =2. Or something like that. J.
[quoted text, click to view] JXStern wrote: > On 15 Jan 2007 13:36:21 -0800, wizofaus@hotmail.com wrote: > > >> How large is your table? > > > >'bout 2 million records, 9 columns. > >> > >> Maybe a simple "update statistics" would also fix things? > >> > >Quite possibly - unfortunately I can't re-create the problem now to > >test it! > >Will be the first thing I try if I see the same problem again, though. > > Could probably reproduce it - insert a 1,000,000 rows = 1, then a > handfull of rows numbered 2-10. Query it for =1, it will scan, then > query it for =2. Or something like that. >
Not sure, but now I have another issue which does seem to point towards parameters being a problem. If I run a query with quite a large number of parameters which basically SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2 IN (@13, @14, @15...@20) it takes over 2 seconds, but when I substitute the last 8 parameters (13-20) with their values, i.e. SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2 IN ('value1', 'value2', 'value3', 'value4'...'value8') the query takes 15 milliseconds! (actually it took slightly longer - about 400msec - on the first run - the previous query always takes ~2 seconds no matter how often I run it). Curiously, the fewer parameters I substitute for the Key2 filter, the slower it runs (I can't improve on the 15 msec by doing the same for they Key1 filter), in a more or less linear relationship. I've tried all the previously mentioned "tricks", including reindexing the table, but no luck. So it seems maybe SQL server has problems optimizing if there are too many parameters. FWIW, I did try putting the parameter values in a temporary table and even using a join, but the total time is significantly longer than 15ms. Again, the parameter values are under my control, so there's no risk of SQL injection, so if there's another good reason NOT to do my own parameter substitution in this case, I'd be interested in hearing it.
[quoted text, click to view] wizofaus@hotmail.com wrote: > Not sure, but now I have another issue which does seem to point > towards parameters being a problem. > > If I run a query with quite a large number of parameters which > basically > > SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2 > IN (@13, @14, @15...@20) > > it takes over 2 seconds, but when I substitute the last 8 parameters > (13-20) with their values, i.e. > > SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2 > IN ('value1', 'value2', 'value3', 'value4'...'value8') > > the query takes 15 milliseconds! (actually it took slightly longer - > about 400msec - on the first run - the previous query always takes ~2 > seconds no matter how often I run it).
Now you've got me really intrigued. Based on everything I've ever read, IN comparisons are supposed to be non-sargable, and therefore non-optimizable. Someone from the SQL Server groups please correct me if I am wrong. Oh wait, maybe you have a sufficient number of values to cause the query engine to use a temp table and join, in which case optimization can occur. I suggest you use SQL Profiler to determne the difference in the execution plans for each method. If the fast query plan involves an index that is not used in the slow query plan, you can use an index hint to force the query engine to use that index ( http://www.sql-server-performance.com/transact_sql_where.asp) [quoted text, click to view] > > Curiously, the fewer parameters I substitute for the Key2 filter, the > slower it runs (I can't improve on the 15 msec by doing the same for > they Key1 filter), in a more or less linear relationship. > > I've tried all the previously mentioned "tricks", including reindexing > the table, but no luck.
What about the trick that involves using local variables in your batch, instead of directly using the parameters: declare @tmp1, @tmp2, etc. set @tmp1=@parm1 etc. SELECT ...(@tmp1, ...) Personally, I would do this in a stored procedure, but you have ignored previous advice to use stored procedures. [quoted text, click to view] > > So it seems maybe SQL server has problems optimizing if there are too > many parameters. > FWIW, I did try putting the parameter values in a temporary table and > even using a join, but the total time is significantly longer than > 15ms.
ISTR reading somewhere that with a sufficient number of values, that the query engine does this anyways behind the scenes. [quoted text, click to view] > > Again, the parameter values are under my control, so there's no risk > of SQL injection,
I always cringe when I hear somebody say this. Unless those values are hard-coded into your application code, you have to be getting those values from somewhere. If a user was involved at any point in the process that generates those values, then you need to at least entertain the possibility that some funny business may have occurred. Look in these articles for "secondary sql injection". It is always a mistake to assume that your user base is too ignorant to take advantage of these techniques: http://mvp.unixwiz.net/techtips/sql-injection.html http://www.sqlsecurity.com/DesktopDefault.aspx?tabid=23 http://www.nextgenss.com/papers/advanced_sql_injection.pdf http://www.nextgenss.com/papers/more_advanced_sql_injection.pdf The bottom line may turn out to be that you need to choose between secure and fast. -- Microsoft MVP - ASP/ASP.NET Please reply to the newsgroup. This email account is my spam trap so I don't check it very often. If you must reply off-line, then remove the "NO SPAM"
[quoted text, click to view] Bob Barrows [MVP] wrote: > wizofaus@hotmail.com wrote: > > Not sure, but now I have another issue which does seem to point > > towards parameters being a problem. > > > > If I run a query with quite a large number of parameters which > > basically > > > > SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2 > > IN (@13, @14, @15...@20) > > > > it takes over 2 seconds, but when I substitute the last 8 parameters > > (13-20) with their values, i.e. > > > > SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2 > > IN ('value1', 'value2', 'value3', 'value4'...'value8') > > > > the query takes 15 milliseconds! (actually it took slightly longer - > > about 400msec - on the first run - the previous query always takes ~2 > > seconds no matter how often I run it). > > Now you've got me really intrigued. Based on everything I've ever read, IN > comparisons are supposed to be non-sargable, and therefore non-optimizable. > Someone from the SQL Server groups please correct me if I am wrong. Oh wait, > maybe you have a sufficient number of values to cause the query engine to > use a temp table and join, in which case optimization can occur.
But that's the thing - I tried doing that explicitly myself, and it's considerably slower. [quoted text, click to view] > > I suggest you use SQL Profiler to determne the difference in the execution > plans for each method. If the fast query plan involves an index that is not > used in the slow query plan, you can use an index hint to force the query > engine to use that index > ( http://www.sql-server-performance.com/transact_sql_where.asp) > > > > > Curiously, the fewer parameters I substitute for the Key2 filter, the > > slower it runs (I can't improve on the 15 msec by doing the same for > > they Key1 filter), in a more or less linear relationship. > > > > I've tried all the previously mentioned "tricks", including reindexing > > the table, but no luck. > > What about the trick that involves using local variables in your batch, > instead of directly using the parameters: > > declare @tmp1, @tmp2, etc. > set @tmp1=@parm1 > etc. > SELECT ...(@tmp1, ...) > > Personally, I would do this in a stored procedure, but you have ignored > previous advice to use stored procedures. Actually I tried stored procs as well, and it didn't seem to be helping. Also, can you write a stored proc to take a variable number of parameters? [quoted text, click to view] > > > > > So it seems maybe SQL server has problems optimizing if there are too > > many parameters. > > FWIW, I did try putting the parameter values in a temporary table and > > even using a join, but the total time is significantly longer than > > 15ms. > > ISTR reading somewhere that with a sufficient number of values, that the > query engine does this anyways behind the scenes. > > > > > Again, the parameter values are under my control, so there's no risk > > of SQL injection, > > I always cringe when I hear somebody say this. Unless those values are > hard-coded into your application code, you have to be getting those values > from somewhere. If a user was involved at any point in the process that > generates those values, then you need to at least entertain the possibility > that some funny business may have occurred.
Perhaps, but in this case they are auto-generated - the user has no control over what the actual values are (only how many there are). At any rate, if the only thing to take into consideration here is performance vs security, then I'm afraid performance does win. No-one will use the application if it's a dog. And as it happens, the database doesn't hold particularly sensitive (or irreplaceable) data anyway.
Bob Barrows [MVP] (reb01501@NOyahoo.SPAMcom) writes: [quoted text, click to view] > Now you've got me really intrigued. Based on everything I've ever read, > IN comparisons are supposed to be non-sargable, and therefore > non-optimizable. Someone from the SQL Server groups please correct me if > I am wrong.
coi IN (val1, val2, ...) is just a shortcut for col1 = val1 OR col2 = val2 OR ... and it's perfectly possible for the optimizer to work with IN to produce a good plan. The main problem is that for many values, the optimization time can exceed the real execution time by far. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at
[quoted text, click to view] wizofaus@hotmail.com wrote: > > JXStern wrote: > > On 15 Jan 2007 13:36:21 -0800, wizofaus@hotmail.com wrote: > > > > >> How large is your table? > > > > > >'bout 2 million records, 9 columns. > > >> > > >> Maybe a simple "update statistics" would also fix things? > > >> > > >Quite possibly - unfortunately I can't re-create the problem now to > > >test it! > > >Will be the first thing I try if I see the same problem again, though. > > > > Could probably reproduce it - insert a 1,000,000 rows = 1, then a > > handfull of rows numbered 2-10. Query it for =1, it will scan, then > > query it for =2. Or something like that. > > > Not sure, but now I have another issue which does seem to point towards > parameters being a problem. > > If I run a query with quite a large number of parameters which > basically > > SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2 > IN (@13, @14, @15...@20) > > it takes over 2 seconds, but when I substitute the last 8 parameters > (13-20) with their values, i.e. > > SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2 > IN ('value1', 'value2', 'value3', 'value4'...'value8') > > the query takes 15 milliseconds! (actually it took slightly longer - > about 400msec - on the first run - the previous query always takes ~2 > seconds no matter how often I run it).
Indeed, that is because with literals will really compile the statement based on the actual values. The optimizer will build a kind of binary tree. It will also remove any duplicates (when applicable). So the execution phase will be very fast. On the other hand, the compilation phase is relatively expensive. And if you have hundreds of values, the optimizer start to choke. [quoted text, click to view] > Curiously, the fewer parameters I substitute for the Key2 filter, the > slower it runs (I can't improve on the 15 msec by doing the same for > they Key1 filter), in a more or less linear relationship.
Explanation: see above [quoted text, click to view] > I've tried all the previously mentioned "tricks", including reindexing > the table, but no luck. > > So it seems maybe SQL server has problems optimizing if there are too > many parameters. > FWIW, I did try putting the parameter values in a temporary table and > even using a join, but the total time is significantly longer than > 15ms.
It is hard to optimizer a scenario like yours. You might try something like this: SELECT SUM(cnt) FROM ( SELECT COUNT(*) AS cnt FROM MyTable WHERE Key1=@0 UNION ALL SELECT COUNT(*) AS cnt FROM MyTable WHERE Key1=@1 UNION ALL ... ) AS T Although the optimizer might automatically come up with a query plan that reflects this strategy, I doubt that it actually will. Please let me know if it actually increases your query performance. HTH,
Don't see what you're looking for? Try a search.
|