[posted and mailed, please reply in news]
gary b (bogachkov@hotmail.com) writes:
[quoted text, click to view] > SELECT store_groups_id
> FROM store_groups
> WHERE store_groups_id IS NOT NULL
> AND type = ?
> ORDER BY group_name
>
> It takes a significantly longer time to run (the time it takes for
> executeQuery() to return ) than if I use
>
> SELECT store_groups_id
> FROM store_groups
> WHERE store_groups_id IS NOT NULL
> AND type = 'M'
> ORDER BY group_name
>...
> This only happens when the table in question is large - I am seeing
> this behaviour for a table with > 1,000,000 records. It doesn't make
> sense to me why a parameterized query would run SLOWER than a
> completely ad-hoc query when it is supposed to be more efficient.
It would have helped if you had included the CREATE TABLE and CREATE INDEX
statements for your table. Now, I will have to guess that your table has
a non-clustered index on type and that index does not include both of
store_groups_id and groups_name, and also at least one these colunms is
absent from the clustered index.
When it is said that prepared queries are more effecient, this is because
the query plan for these are cached and can be reused. This means that
if you resubmit the query you are saved the compilation phase.
Ad-hoc statements can be cached too, and also auto-parameterized, so they
would be very similar to prepared queries. But auto-parameterization
does not always happen, and this seems to be such a case.
When SQL Server builds a query plan, it looks ar the values in the queries
and also at the statistics saved for the table. Now, if a value is a
constant, it knows that this is the value that applies, and no other.
If the statement is parameterized, SQL Server still looks at the current
parameter value as a hint, but it has to consider the possibility that
next time, the value passed might be different.
For this query it seems that SQL Server has a choices between two
alternatives:
1) Scan the data pages of the table.
2) Use the non-clustered index on type.
For a value of type with few rows, using the index is much faster. However,
say that 40% of the rows has ' ' in type. Using the index to retrieve
these rows would be very expensive - a lot more expensive than scanning
the table, because each data page would be read more than once.
SQL Server tends to be conservative here. I've more often seen it choose
table scan incorrectly than incorrectly seeking a non-clustered index.
Had the index included all columns in the query, this would have changed
the scene completely, since SQL Server would have no reason to access
the data pages. As I mentioned the clustered index also has importance.
This is because the pointer to the data page is actually the value of
the clustered-index key. So the columns of the clustered index is present
in all non-clustered indexes.
[quoted text, click to view] > Is this a sql-server specific problem or something that would pertain
> to other databases as well? I there something about the coorect use of
> bind parameters that I overall don't understand?
The behaviour is cetainly related to the architecture of SQL Server. I
don't have experience of other engines, but I would expect most engines
can give you surprises, although which surprises you get may differ
from engine to engine.
[quoted text, click to view] > If I can provide some hints in Java then this would be great..
> otherwise, do I need to turn/off certain settings on the database
> itself?
The best in this case may be to create a covering index for this
particular query. You can also use an index hint to force use of
the non-clustered index, but if you actually pass a value with many
occurrances your application might grind to an actual standstill.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server SP3 at