Groups | Blog | Home
all groups > sql server full text search > august 2006 >

sql server full text search : Sql Server 2005 Profiler cpu utilization inaccurate + search question


gsr_boost NO[at]SPAM yahoo.com
8/31/2006 1:16:42 AM
Hi All,

We are currently having some performance issues with a query that is
utilizing a "Like" search with wildcards.

Background:
We are running
SQL Server 2005 64Bit (sp1)
Windows 2003 Standard 64Bit edition (sp1)
PowerEdge DUAL Xeon EMT64 machine
16GB ram
CX300 SAN with 10 x 140GB 15KRPM disks (RAID 10) for the db file
Raid 1 volume for the logs

Details are as follows:

We have a "user" table with approx 5 million rows.
We currently have searches that look like this:

select lUserID, FNAME, LNAME, EMAIL from user
where FName like '%brad%' or LName like '%benefit%' OR EMAIL like
'%brad%'

This query has all the indexes setup nicely but I think we are now
approaching a level of records where it's not appropriate to perform
this wildcard query anymore.
The biggest problem is the proceeding wildcard percent. This is
required as some users have two first names eg "Hang Chai", but need to
be found when searching simply on "Chai"

What we thought we would try was a full text index.

So we created a varchar (500) field "MERGEDDATA" with "FNAME SLNAME
SEMAIL" in it and then used this field as input for a full text index.
(lUserID as primary id for the index)

The query looks like this:

Select lApplicantID, FNAME,LNAME,EMAIL from user
where Contains(MERGEDDATA,'"John Stock*"')

It appears there is a massive increase in performance. The Query
returns much faster, and profiler reports that the cpu has reduced 16
fold times (from 1600, to 160)

But if we search for:

Select lApplicantID, FNAME,LNAME,EMAIL from user
where Contains(MERGEDDATA,'"John S*"')

We get an error "Too many full-text columns or the full-text query is
too complex to be executed."

However if we search for :

Select lApplicantID, FNAME,LNAME,EMAIL from user
where Contains(MERGEDDATA,'"John W*"')

We don't get the problem, and the search returns quickly with low
cpu/reads.
We have managed to establish that it only occurs on the letters "S" and
"M"
We have also noted that the last name starts with S and M slightly more
times than with other letters in our users table. eg Last names
starting with S = 330,000 rows, last names starting with W = 310,000
rows

Is there a way to get around this problem as our users often want to
search by firstname and first letter of last name.

We have also noticed in our travels that profiler may not be reporting
accurate CPU for FTS queries. Does profiler in sql 2005 report accurate
CPU use when running FTS queries. (As we know this runs under the
MSFTESQL.exe rather than sqlservr.exe)
The reason we are doubting this is that in a different query, we have
seen a query duration in excess of 40 seconds, but the reads and cpu
were very low (perhaps 10 reads and 5 cpu). Normally a 40 second query
would rack up 1000's of CPU

Any feedback would be helpful

Brad
Hilary Cotter
8/31/2006 10:11:42 PM
run a reorganize and see if this helps. Don't expect query optimizer to know
about SQL FTS as its a remote query.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

AddThis Social Bookmark Button