Groups | Blog | Home
all groups > dotnet ado.net > january 2007 >

dotnet ado.net : Bizarre slow query problem (again)


wizofaus NO[at]SPAM hotmail.com
1/10/2007 9:44:57 PM
I previously posted about a problem where it seemed that changing the
case of the word "BY" in a SELECT query was causing it to run much much
faster.

Now I've hit the same thing again, where basically almost any change I
make to how the query is executed (so that it still performs the same
function) causes the performance to jump from a dismal 7 or 8 seconds
to instantaneous. It's a very simple query of the form:

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

which was running fine until a moment ago, when it suddently started
running hopelessly slowly. If change anything in the query to
lowercase (or the Min to uppercase), it runs fine again. Last time
someone suggested something about a bad plan being cached, and after a
bit of research I found the commands DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE. Sure enough, after running these, the query started
running fine again. The question is

a) why is this happening? Is it a bug in my code, or in SQL server?
b) is it worth detecting it and fixing it automatically? I.e, should I
put some code in that notices that a query is running far too slowly,
then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that
cause other problems?

Thanks
wizofaus NO[at]SPAM hotmail.com
1/11/2007 2:04:03 AM
[quoted text, click to view]
Thanks for that...amazingly enough it turned that that was exactly my
problem, although I'm using ad-hoc queries rather than stored procs. I
did some more testing, and it turned out that it was because it was
executing the same query twice, the first time with an atypical
parameter value, and the second time with a more typical one, that the
query was running so slowly. That is, executing

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 999
followed by
SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 123

Caused the second query to run absurdly slowly, because in the first
case only very few rows in the table had MyKey = 999 whereas almost
every row had MyKey = 123. After doing a DBCC FREEPROCCACHE and
swapping the queries around, they both ran fine.

In the end I ended up de-parameterizing the query just for this case,
but now I'm worried - how can I be sure that my other queries won't
suffer from the same problem? Should I never use parameters because of
this possibility?
wizofaus NO[at]SPAM hotmail.com
1/11/2007 3:47:40 AM

[quoted text, click to view]
Sure, except that the content of the database is out of my control -
this particular scenario (where nearly all the records matched a
particular key, but the query was first run against a different key)
could easily arrise in a production environment. More to the point,
I've seen no evidence that I'm getting any performance benefits from
using parameterized queries.
I suspect I will at least add a configuration option to avoid
parameterized queries (relatively straightforward, as I have a layer of
code that handles query parameters) if I see a problem like this again.
Bob Barrows [MVP]
1/11/2007 7:12:42 AM
[quoted text, click to view]

"de-parameterizing"? You mean changing to dynamic sql and leaving yourself
vulnerable to sql injection??

[quoted text, click to view]

This is a ridiculous overreaction. Problems due to parameter-sniffing are
too rare to justify eliminating the benefits of using parameters. Talk about
"throwing the baby out with te bath water".

The article showed two, no three, ways to alleviate the problems caused by
parameter sniffing and still use parameters. So what do you do? ignore the
article's advice and "de-parameterize" your query...

--
Microsoft MVP - ASP/ASP.NET
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"

Uri Dimant
1/11/2007 10:22:09 AM
Hi
http://blogs.msdn.com/khen1234/archive/2005/06/02/424228.aspx



[quoted text, click to view]

swaroop.atre NO[at]SPAM gmail.com
1/11/2007 12:20:17 PM
be EXTREMELY wary when using DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE on a production machine. They clear all cached SPs,
queries and plans. The instance is bound to run under extreme stress
for a considerable amount of time.





[quoted text, click to view]
Uri Dimant
1/11/2007 12:40:21 PM
[quoted text, click to view]
An ability using parameters is very powerful , don't afraid using parameters
, just test it carefuly




[quoted text, click to view]

wizofaus NO[at]SPAM hotmail.com
1/11/2007 1:27:08 PM

[quoted text, click to view]
The instance? You mean it affects all databases?
In this case, I determined I'd have to do it before every single query
call, so obviously that's not practical.
Stephen Howe
1/11/2007 2:49:02 PM
[quoted text, click to view]

Pardon me but

Why did you not say which version of SQL Server you are running on?
And why not mention the version of MDAC you have installed (since you posted
to microsoft.public.data.ado)?
And why not mention something about the Primary Keys & Indexes of the table
you are querying - I take it you have an index on MyKey?
And have you checked the documentation, whitepapers, MSDN for details on
query performance?
And have you checked the query plan in Query Analyser?

Cheers

Stephen Howe

wizofaus NO[at]SPAM hotmail.com
1/11/2007 7:18:29 PM

[quoted text, click to view]
Thanks...one of the most helpful replies I've had on usenet for some
time now!

The problem is that it's pretty hard for me to know that a value is
"odd". In this case, like I said the query in this case is

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

Where @0 is actually a value from a list that is obtained from
elsewhere (not the database). It loops through this list, calling the
same query for each one.

Now, I suppose I could first do

SELECT Count(*) FROM MyTable WHERE MyKey = @0

and determine if the number was very low, and if so, de-parameterize it
or add a space or whatever, but then this second query would
potentially suffer from the same problem.

I suppose another alternative is to build another query first

SELECT MyKey, Min(MyValue) FROM MyTable GROUP BY MyKey

then use this to obtain the minimum value for each key, but there's
only so much time I can spend rewriting queries to side-step quirky
performance issues (the application has many many ad-hoc queries,
nearly all parameterized on the same key, so they are all prone to the
same problem).

BTW, this is under SQL server 2000. I've yet to determine if this
particular problem does actually exist under 2005, which is what we're
using for the production environment. Will definitely do that before I
waste too much more time on it.
Erland Sommarskog
1/11/2007 10:24:51 PM
(wizofaus@hotmail.com) writes:
[quoted text, click to view]

Which version of SQL Server are you using? Here is a test that you can
try to see that you can do to actually test the benefit of
parameterised queries. First create this database:

CREATE DATABASE many_sps
go
USE many_sps
go
DECLARE @sql nvarchar(4000),
@x int
SELECT @x = 1000
WHILE @x > 0
BEGIN
SELECT @sql = 'CREATE PROCEDURE abc_' + ltrim(str(@x)) +
'_sp @orderid int AS
SELECT O.OrderID, O.OrderDate, O.CustomerID, C.CompanyName,
Prodcnt = OD.cnt, Totalsum = OD.total
FROM Northwind..Orders O
JOIN Northwind..Customers C ON O.CustomerID = C.CustomerID
JOIN (SELECT OrderID, cnt = COUNT(*), total = SUM(Quantity * UnitPrice)
FROM Northwind..[Order Details]
GROUP BY OrderID) AS OD ON OD.OrderID = O.OrderID
WHERE O.OrderID = @orderid'
EXEC(@sql)
SELECT @x = @x - 1
END

(Don't worry if you don't have Northwind on your server, you are not going
to run these procedures.)

Then use F7 to get the Summary page, and navigate to the Stored Procedures
node for many_sps. Select some 200 procedures, right-click and select
Script As Create To New Query Window. Go for a cup of coffee - this will
take some time depending on your hardware.

When the operation has completed (or you have gotten tired of waiting
and killed SSMS), issue this command:

ALTER DATABASE db SET PARAMETERIZATION FORCED

Redo the scripting operation. It will now complete in five seconds.

The reason for this is that SQL Server Management Studio does not use
parameterised queries. For every procedure it scripts, Mgmt Studio
issues around five queries. All these queries makes it to the
cache that explodes, and all these queries are compiled.

When you set a database to forced parameterisation, SQL Server will
auto-parameterise all statements (with some exceptions documented in
Books Online); normally it only auto-parameterise very simple queries.
In the case of Mgmt Studio it's reallly a go-faster switch.

So dismissing caching of parameterised queries can be a serious mistake.
But it is certainly true that there are situations where parameter
sniffing can be a problem. If it is possible for you to tell in the
appliocation "this is an odd value that needs a special plan", then
you can modify the query text by adding a redudant condition like
"AND 1 = 1". Actually as you have found, changing "BY" to "by" or even
adding extra spaces help. This is because the lookup in the cache
is done on a hash without first collapsing spaces or parsing the
query text.



--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Erland Sommarskog
1/11/2007 10:46:56 PM
Erland Sommarskog (esquel@sommarskog.se) writes:
[quoted text, click to view]

By the way, this is something important for your application as well.
Say that a DBA finds out that your app is thrashing the cache by not
using parameterised queries, and sets the database to forced
parameterisation, you will get back the behaviour you have now.

A bettery remedy is to add OPTION (RECOMPILE) at the end of sensitive
queries. This forces a statement recompile, and the query will not be
put in cache. This means that you can still use parameterised queries
and get the other benefits of it. (Protection for SQL injection and
repsecting the user's regional settings.) You also avoid thrashing
the cache.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Erland Sommarskog
1/12/2007 9:56:57 PM
(wizofaus@hotmail.com) writes:
[quoted text, click to view]

I can understand that this is not always simple. I didn't say this, in
hope it would be. :-)

However, I think I have a cure for you:

[quoted text, click to view]

Stop! Don't do that! The problems with query plans aside, this is an
ineffecient use of SQL Server. Get all data at once with:

SELECT t.MyKey, Min(t.MyValue)
FROM MyTable t
JOIN list_to_table(@list) f ON t.MyKey = f.value
GROUP BY t.MyKey

Where list_to_table is a table-valued function that transform the list
to a table. I have a whole bunch of such functions on
http://www.sommarskog.se/arrays-in-sql.html.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Cor Ligthert [MVP]
1/13/2007 3:15:45 PM
Stephen,

You forget in the list, the version of the framework that is used. It is to
AdoNet as well you know.

:-)


Cor

"Stephen Howe" <stephenPOINThoweATtns-globalPOINTcom> schreef in bericht
news:%23kg1j%23YNHHA.992@TK2MSFTNGP04.phx.gbl...
[quoted text, click to view]

wizofaus NO[at]SPAM hotmail.com
1/13/2007 3:24:01 PM

[quoted text, click to view]
I agree this sort of thing would be preferable, and more efficient, but
as it is, providing I don't get the bad plan problem, it's efficient
enough as it is, and I'm wary of making too many big changes at this
point. That sort of thing I'd prefer to leave for the next version.

The thing is, I just tried exporting all the records to a CSV file,
creating a clean database with the same tables/indices, re-importing
all the records, and the same problem DOESN'T happen: I can run that
query with the "odd" value first, then with the common value, and both
queries run fast. So something else has happened to my database that
surely I must be able to reset somehow?
wizofaus NO[at]SPAM hotmail.com
1/13/2007 6:15:11 PM

[quoted text, click to view]

Sure. I definitely plan on doing some query optimization and
consolidation for the next version. Your routines may well come in
handy, so thanks.
[quoted text, click to view]
Well I found another solution - reindex the table.

I ran

dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the
problem is gone away. My main concern was that if we did see this
problem occuring in productoion databases, how could we fix it, other
than changing the code, and at least now I have such a solution, and
it's a bit less drastic than exporting and re-importing all the data
(which potentially could have taken hours).

I'm not sure whether 90 is the best parameter value here, that was just
from the MSDN article.
Erland Sommarskog
1/13/2007 11:45:00 PM
(wizofaus@hotmail.com) writes:
[quoted text, click to view]

But keep in mind that the solution you have now will not scale well. If
the data in production is ten times larger than you have expected, you
will get ten times more execution time, even with the good plan.

[quoted text, click to view]

I eavesdropped a discussion at PASS in Seattle last year, when a guy
had done extensive tests, and he could repeat a scenario that depending
on which order he loaded the same data, he would get different plans,
good or bad. I presume that part of the answer lies what exactly is in
the statistics. Normally, statistics are only samples, and if the
statistics does not well reflect the data distribution, your plans
will not always be the best.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Erland Sommarskog
1/14/2007 10:32:59 AM
(wizofaus@hotmail.com) writes:
[quoted text, click to view]

It's recommended to run a maintenance job to reindex the table with some
frequency. The main reason for this is to prevent defragmentation. A side
effect of this is that statistics are updated with fullscan, that is all
rows are considered. That gives the optimizer more accurate information.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
JXStern
1/15/2007 11:47:44 AM
[quoted text, click to view]

How large is your table?

Maybe a simple "update statistics" would also fix things?

J.
wizofaus NO[at]SPAM hotmail.com
1/15/2007 1:36:21 PM

[quoted text, click to view]

'bout 2 million records, 9 columns.
[quoted text, click to view]
Quite possibly - unfortunately I can't re-create the problem now to
test it!
Will be the first thing I try if I see the same problem again, though.
JXStern
1/15/2007 3:22:38 PM
[quoted text, click to view]

Could probably reproduce it - insert a 1,000,000 rows = 1, then a
handfull of rows numbered 2-10. Query it for =1, it will scan, then
query it for =2. Or something like that.

J.
wizofaus NO[at]SPAM hotmail.com
1/17/2007 8:59:47 PM

[quoted text, click to view]
Not sure, but now I have another issue which does seem to point towards
parameters being a problem.

If I run a query with quite a large number of parameters which
basically

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN (@13, @14, @15...@20)

it takes over 2 seconds, but when I substitute the last 8 parameters
(13-20) with their values, i.e.

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN ('value1', 'value2', 'value3', 'value4'...'value8')

the query takes 15 milliseconds! (actually it took slightly longer -
about 400msec - on the first run - the previous query always takes ~2
seconds no matter how often I run it).

Curiously, the fewer parameters I substitute for the Key2 filter, the
slower it runs (I can't improve on the 15 msec by doing the same for
they Key1 filter), in a more or less linear relationship.

I've tried all the previously mentioned "tricks", including reindexing
the table, but no luck.

So it seems maybe SQL server has problems optimizing if there are too
many parameters.
FWIW, I did try putting the parameter values in a temporary table and
even using a join, but the total time is significantly longer than
15ms.

Again, the parameter values are under my control, so there's no risk of
SQL injection, so if there's another good reason NOT to do my own
parameter substitution in this case, I'd be interested in hearing it.
Bob Barrows [MVP]
1/18/2007 7:20:01 AM
[quoted text, click to view]

Now you've got me really intrigued. Based on everything I've ever read, IN
comparisons are supposed to be non-sargable, and therefore non-optimizable.
Someone from the SQL Server groups please correct me if I am wrong. Oh wait,
maybe you have a sufficient number of values to cause the query engine to
use a temp table and join, in which case optimization can occur.

I suggest you use SQL Profiler to determne the difference in the execution
plans for each method. If the fast query plan involves an index that is not
used in the slow query plan, you can use an index hint to force the query
engine to use that index
(http://www.sql-server-performance.com/transact_sql_where.asp)

[quoted text, click to view]

What about the trick that involves using local variables in your batch,
instead of directly using the parameters:

declare @tmp1, @tmp2, etc.
set @tmp1=@parm1
etc.
SELECT ...(@tmp1, ...)

Personally, I would do this in a stored procedure, but you have ignored
previous advice to use stored procedures.

[quoted text, click to view]

ISTR reading somewhere that with a sufficient number of values, that the
query engine does this anyways behind the scenes.

[quoted text, click to view]

I always cringe when I hear somebody say this. Unless those values are
hard-coded into your application code, you have to be getting those values
from somewhere. If a user was involved at any point in the process that
generates those values, then you need to at least entertain the possibility
that some funny business may have occurred. Look in these articles for
"secondary sql injection". It is always a mistake to assume that your user
base is too ignorant to take advantage of these techniques:
http://mvp.unixwiz.net/techtips/sql-injection.html
http://www.sqlsecurity.com/DesktopDefault.aspx?tabid=23
http://www.nextgenss.com/papers/advanced_sql_injection.pdf
http://www.nextgenss.com/papers/more_advanced_sql_injection.pdf

The bottom line may turn out to be that you need to choose between secure
and fast.

--
Microsoft MVP - ASP/ASP.NET
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"

wizofaus NO[at]SPAM hotmail.com
1/18/2007 12:42:34 PM

[quoted text, click to view]

But that's the thing - I tried doing that explicitly myself, and it's
considerably slower.
[quoted text, click to view]

Actually I tried stored procs as well, and it didn't seem to be
helping. Also, can you write a stored proc to take a variable number
of parameters?
[quoted text, click to view]

Perhaps, but in this case they are auto-generated - the user has no
control over what the actual values are (only how many there are).

At any rate, if the only thing to take into consideration here is
performance vs security, then I'm afraid performance does win. No-one
will use the application if it's a dog. And as it happens, the
database doesn't hold particularly sensitive (or irreplaceable) data
anyway.
Erland Sommarskog
1/18/2007 3:04:54 PM
Bob Barrows [MVP] (reb01501@NOyahoo.SPAMcom) writes:
[quoted text, click to view]

coi IN (val1, val2, ...)

is just a shortcut for

col1 = val1 OR col2 = val2 OR ...

and it's perfectly possible for the optimizer to work with IN to produce a
good plan. The main problem is that for many values, the optimization time
can exceed the real execution time by far.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Gert-Jan Strik
1/18/2007 8:35:10 PM
[quoted text, click to view]

Indeed, that is because with literals will really compile the statement
based on the actual values. The optimizer will build a kind of binary
tree. It will also remove any duplicates (when applicable). So the
execution phase will be very fast.

On the other hand, the compilation phase is relatively expensive. And if
you have hundreds of values, the optimizer start to choke.

[quoted text, click to view]

Explanation: see above

[quoted text, click to view]

It is hard to optimizer a scenario like yours. You might try something
like this:

SELECT SUM(cnt)
FROM (
SELECT COUNT(*) AS cnt FROM MyTable WHERE Key1=@0
UNION ALL
SELECT COUNT(*) AS cnt FROM MyTable WHERE Key1=@1
UNION ALL
...
) AS T

Although the optimizer might automatically come up with a query plan
that reflects this strategy, I doubt that it actually will.

Please let me know if it actually increases your query performance.

HTH,
AddThis Social Bookmark Button