Groups | Blog | Home
all groups > sql server full text search > may 2006 >

sql server full text search : Full text indexing and execution plans


Peter van Wilrijk
5/24/2006 12:00:00 AM
Hi there,

We've got an ads website with about 170.000 ads divided in categories and
subcategories.

eg.

Ford Taunus, ... = 3-5 (on wheels - cars)
Trek mountain bike, ... = 3-19 (on wheels - bycicles)
etc.


I've got a search query in which the webuser can choose ...

1) to search by keywords on all ads
2) to search by keywords on ads of a certain category (ADS_HR)
3) to search by keywords on ads of a certain category and 1 or more
subcategories (ADS_OR)

I've got some examples here

1)
SELECT * FROM ADS, ADSFULL
where ads.ads_adid = adsfull.fads_adid
AND (ADS_ADID IN (SELECT U.FADS_ADID FROM ADSFULL U WHERE CONTAINS(*,
'fiets') ) ) )
AND (ADS_HR = 3)
AND (ADS_OR in (19))
ORDER BY ads_adid DESC

2)
SELECT * FROM ADS, ADSFULL
where ads.ads_adid = adsfull.fads_adid
AND (ADS_ADID IN (SELECT U.FADS_ADID FROM ADSFULL U WHERE (CONTAINS(*,
'fiets') ) ) )
AND (ADS_HR = 3)
AND (ADS_OR in (19, 20))
ORDER BY ads_adid DESC

3)
SELECT * FROM ADS, ADSFULL
where ads.ads_adid = adsfull.fads_adid
AND (ADS_ADID IN (SELECT U.FADS_ADID FROM ADSFULL U WHERE (CONTAINS(*,
'fiets') ) ) )
AND (ADS_HR = 3)
AND (ADS_OR in (18))
ORDER BY ads_adid DESC

4)
SELECT * FROM ADS, ADSFULL
where ads.ads_adid = adsfull.fads_adid
AND (ADS_ADID IN (SELECT U.FADS_ADID FROM ADSFULL U WHERE (CONTAINS(*,
'fiets') ) ) )
AND (ADS_HR = 3)
AND (ADS_OR in (18, 19))
ORDER BY ads_adid DESC


At once query 1) and 2) started using another execution plan, implying
execution takes about 1m40 while 3) and 4) kept there original execution
plan, implying execution takes less than 1 second.

I'm VB developer, no SQL SERVER expert, no dba, but ...
I guess it has to do something with the statistics SQL SERVER is storing in
order to decide what indexes to use when executing a query.
I guess this means SQL Server might change execution plan when the number of
"Ford Taunus" in 3-5 changes from 3% to 7%.

Auto-update statistics is enabled in our database.

In this case it seems SQL Server makes a wrong decision implying our website
gets into performance trouble.

So the problem depends on which category and subcategory the user selects
What could I do?
I've put a visual copy of the execution plans on
http://www.hoveseliga.be/executionplan2.gif
I've the impression it goes wrong when SQL SERVER starts using the index
IX_RUBRIEK

CREATE INDEX [IX_RUBRIEK] ON [dbo].[ADS]([ADS_HR], [ADS_OR]) ON [PRIMARY]

What could I do?
Can and should I force SQL Server to not use this index, can I force to
always use the full text index first with a query/index hint? I've read
index hints aren't good practice.
Have I got alternatives?


All suggestions appreciated.
Sorry for errors in english language.

Thanks
Kind Regards
Peter Van Wilrijk.



Hilary Cotter
5/24/2006 12:00:00 AM
I spent some time in Genk, Belgium is a beautiful country.

To answer your questions. I think you want to do something like this. I am
hoping and praying you will not get the lazy spool that I do.

The function is courtesy of SQL MVP Erland Sommarskog.

CREATE FUNCTION iter_intlist_to_table (@list ntext)
RETURNS @tbl TABLE (listpos int IDENTITY(1, 1) NOT NULL,
number int NOT NULL) AS
BEGIN
DECLARE @pos int,
@textpos int,
@chunklen smallint,
@str nvarchar(4000),
@tmpstr nvarchar(4000),
@leftover nvarchar(4000)

SET @textpos = 1
SET @leftover = ''
WHILE @textpos <= datalength(@list) / 2
BEGIN
SET @chunklen = 4000 - datalength(@leftover) / 2
SET @tmpstr = ltrim(@leftover + substring(@list, @textpos,
@chunklen))
SET @textpos = @textpos + @chunklen

SET @pos = charindex(' ', @tmpstr)
WHILE @pos > 0
BEGIN
SET @str = substring(@tmpstr, 1, @pos - 1)
INSERT @tbl (number) VALUES(convert(int, @str))
SET @tmpstr = ltrim(substring(@tmpstr, @pos + 1, len(@tmpstr)))
SET @pos = charindex(' ', @tmpstr)
END

SET @leftover = @tmpstr
END

IF ltrim(rtrim(@leftover)) <> ''
INSERT @tbl (number) VALUES(convert(int, @leftover))

RETURN
END
--execution is SearchMe 'fiets','10 20 30',2
CREATE procedure SearchMe (@phrase varchar(200), @qualifiers varchar(200),
@ADS_HR int)
as
declare @searchphrase varchar(200)
declare @holding table(ADS_OR int not null primary key)
insert into @holding
select number from dbo.iter_intlist_to_table (@Qualifiers)
select @searchphrase=char(34)+@phrase +char(34)
SELECT * FROM ADS
join ADSFULL on ADSFULL.fads_adid=ADS.ADS_ADID and ADS_HR = @ADS_HR
join @holding H on H.ADS_OR =ADS.ADS_OR
join (select [key] from containstable(ADSFULL,charcol, @searchphrase)) as t
on t.[key]=ADS.ADS_ADID
ORDER BY ads_adid DESC


--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Peter Van Wilrijk
5/30/2006 2:05:40 PM
Thanks a lot Hilary,

[quoted text, click to view]

Works great. I guess performance can only boost by replacing "IN (3, 4,
19)" by JOINING with a temporary table storing those values.

[quoted text, click to view]

Containstable and [key] is new to me.
I checked BOL and interprete that [key] refers to the field(s) linked to
PK_ADSFULL, the unique index I specified in ...

sp_fulltext_table 'adsfull', 'create', 'FTADS', 'PK_ADSFULL'

Here I've got a little problem since PK_ADSFULL indexes the identity
column FADS_ID, while in fact (something you couldn't see in me first
post) ADSFULL.FADS_ADID refers to ADS.ADS_ADID. Sorry, I forgot to add
the scripts below.

I guess this means I can't use CONTAINSTABLE here??

Anyway, I replaced your CONTAINSTABLE ...

join (select [key] from containstable(ADSFULL, *, @searchphrase)) as t
on t.[key]=ADS.ADS_ADID

by ...

join (SELECT FADS_ADID FROM ADSFULL WHERE (CONTAINS(*, @searchphrase)))
as t on t.[fads_adid]=ADS.ADS_ADID

I guess this construction is all right as well??

At the end ... all that matters is ... the construction executes
performant now, undependending the list of subcategories I pass.

Thanks a lot,
Peter.

Thanks as well to Hugo Kornelis who (among other great tips about
statistics, parameter sniffing, ...) already suggested in
microsoft.public.nl.sqlserver to replace

WHERE (ADS_ADID IN (SELECT FADS_ADID FROM ADSFULL WHERE (CONTAINS(*,
'"fiets"') ) ) )

by

FROM ADS JOIN ADSFULL ON ads_adid = fads_adid
WHERE CONTAINS(ADSFULL.*, '"fiets"')

changing execution plan and boosting performance as well
....
something you did here as well while rewriting my query using ...

join (select ...)

Muchas Gracias.

----
CREATE TABLE [dbo].[ADS] (
[ADS_ADID] [int] IDENTITY (1, 1) NOT NULL ,
[ADS_HR] [tinyint] NOT NULL ,
[ADS_OR] [tinyint] NOT NULL ,
...
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[ADSFULL] (
[FADS_ID] [int] IDENTITY (1, 1) NOT NULL ,
[FADS_ADID] [int] NOT NULL ,
[FADS_TITLE] [nvarchar] (50) COLLATE
...
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[ADS] WITH NOCHECK ADD
CONSTRAINT [PK_ADS] PRIMARY KEY CLUSTERED
(
[ADS_ADID]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[ADSFULL] WITH NOCHECK ADD
CONSTRAINT [PK_ADSFULL] PRIMARY KEY CLUSTERED
(
[FADS_ID]
) ON [PRIMARY]
GO

exec sp_fulltext_table N'[dbo].[ADSFULL]', N'create', N'FTADS',
N'PK_ADSFULL'
GO

exec sp_fulltext_column N'[dbo].[ADSFULL]', N'FADS_TITLE', N'add', 1033
GO

exec sp_fulltext_column N'[dbo].[ADSFULL]', N'FADS_TEXT', N'add', 1033
GO
Peter van Wilrijk
5/31/2006 12:00:00 AM
Hi Hilary,

In addition ... I guess ... by moving contains to the join section ... you
can force SQL Server
to make a selection on keyword(s) before handling any conditions in a WHERE
clause, right. That's probably why you did it, isn't it?

[quoted text, click to view]

Kind regards
Peter Van Wilrijk

Hilary Cotter
6/1/2006 9:36:04 PM
Use containstable instead of contains for better performance.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Hilary Cotter
6/1/2006 9:37:57 PM
well you can't really. You have to wait for mssearch to return the results
before they can be trimmed further.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

AddThis Social Bookmark Button