Groups | Blog | Home
all groups > sql server full text search > august 2004 >

sql server full text search : Too slow when I have to join to CONTAINSTABLE


needadohelp NO[at]SPAM hotmail.com
8/19/2004 1:51:53 AM
I have about 15 million rows in one table with about 8 gigs of text
that I have a full text index set up for.

When I need to search everything, this works well:

SELECT [Key], Rank FROM CONTAINSTABLE (Items, Content, 'searchfor',
200) ORDER BY RANK DESC

The key is the identity column (ItemID) on the indexed table (Items).

The big problem is that, of course, the majority of the time I need to
search only a sub-set of the data. The items are split into about 200
Categories. So if I want to search only in a single category, I have
to do something like this:

SELECT TOP 200 ItemID, KEY_TBL.RANK
FROM Items WITH (NOLOCK)
INNER JOIN CONTAINSTABLE (Items, Content, 'searchfor') AS KEY_TBL ON
Items.ItemID = KEY_TBL.[KEY]
WHERE CategoryID = @CategoryID
ORDER BY KEY_TBL.RANK DESC

The problem is that I cannot use the top_n_by_rank argument in this
context because that may filter all (or just too many) of the results
in the Category. The result is that CONTAINSTABLE may return (for
example) 250,000 ItemIDs to be joined back to the Items table,
compared with the CategoryID, and sorted by rank. This is just way to
slow.

I know the problem must crop up time and time again… so is there
solution?

Thanks,
Hilary Cotter
8/19/2004 7:42:13 AM
Regretably the best way to handle this is by partitioning your data, ie you
will have to have 200 child tables, one for each category. Then do some
conditional operation in your search to search on the child table specific
to the category.

--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html


[quoted text, click to view]

needadohelp NO[at]SPAM hotmail.com
8/19/2004 2:06:10 PM
But then, of course, I'd have a problem searching across Categories.

[quoted text, click to view]
Hilary Cotter
8/20/2004 8:57:23 AM
Depending on your users search habits, or requirements you may be able to
then have another table for all categories. So you would have 201 tables,
probably on seperate catalogs as each catalog has 15 (IIRC) threads
dedicated to indexing. I think there is a limit to the number of catalogs
per database, but I can't recall that off the top of my head.

If a users is searching on a specific category the search would be directed
to one of these seperate table/catalog. If the user is searching on all
categories the search would be directed to a single consolidated table. You
could also use a union query, but this would turn out to be expensive.

--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html


[quoted text, click to view]

needadohelp NO[at]SPAM hotmail.com
8/20/2004 9:03:55 PM
That's pretty scary too. It might be ok if the data was fairy static,
but about 15,000 Items are added daily and as it is the Incremental
Update runs constantly.

Here's what I'm currently considering:

I could, for each item, include a category Marker right in the text -
so if the Item is in CategoryID 50, the text would include the word
"CatMarker0050". Then I would have to

1) Make sure the Marker is added when adding Items.
2) Make sure the Marker is stripped before displaying text.
3) Make sure the Marker is updated if the Item is moved.
4) Alter all Category based Full Text queries to require that the
Marker is found in the text.

SELECT [Key], Rank FROM CONTAINSTABLE (Items, Content, 'searchfor AND
CatMarker0050',200)
ORDER BY RANK DESC


What do you think?

~ James

[quoted text, click to view]
Hilary Cotter
8/21/2004 10:35:36 AM
that's an excellent idea.

The problems with this is that your ranking is probably going to be dampened
somewhat by the additional search term , but it should be consistent.

For display purposes you should either have two columns one in the original
format and one with the marked format, or have a different table containing
the marked content which is queried, but the join is against the table with
the unmarked content.

Here is an example of such a query.

http://groups.google.com/groups?selm=uzlWtk6ZEHA.3476%40tk2msftngp13.phx.gbl&output=gplain

[quoted text, click to view]

AddThis Social Bookmark Button