Groups | Blog | Home
all groups > sql server full text search > october 2004 >

sql server full text search : Full Text Search Performance


Ali Salem
10/20/2004 7:09:06 AM
Hello,

I have full text search enabled on two large tables, the first one has 3
million records (FT index column length is around 512-1025 char), and the
other one has around 1.6 records (FT index column length is around 50-120
char)

The simplest query that uses the smiplest form of FT on the first table FT
index search needs 23-30 seconds to complete and from 10-15 seconds on the
seconds table as it the indexed column is smaller.

SQL server is deployed on 4 P servers with 8 GB of memory and attached to a
very powerful SAN system. Memory usage on the server is not very high so I
believe there is enough memory for the mssearch service to use.

I just wonder if this is the normal throughput of the Full Text search of
SQL Server. If not, I will appreciate any tips and hints that might be the
reason of the system.
--
Hilary Cotter
10/20/2004 11:50:39 AM
That depends. SQL FTS performance is most sensitive to the number of rows
you are returning. You should limit your results set as much as possible - I
think you will find that the practical limit for most applications is around
100-200 rows.

Limit it by using the the top_n_by_rank operator in ContainsTable or
FreeTextTable, ie

USE Northwind
GO
SELECT FT_TBL.Description,
FT_TBL.CategoryName,
KEY_TBL.RANK
FROM Categories AS FT_TBL INNER JOIN
CONTAINSTABLE (Categories, Description,
'test',200
) AS KEY_TBL
ON FT_TBL.CategoryID = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC
[quoted text, click to view]

Ali Salem
10/21/2004 4:29:03 AM

thanks for your reply.

I cannot limit the search in this way, as there are other parameters in my
query that can affect the returned result set and that are ourside the FTS.

I am asking for any figures about FTS performance? What should I expect from
it? Is this behavior am getting normal for the data size or, I am having
something wrong

[quoted text, click to view]
John Kane
10/22/2004 11:33:37 AM
Ali Salem,
Could you provide the full output of the following SQL script as it is very
helpful in troubleshooting SQL FTS issue and understanding your environment
and issues!

use <your_database_name_here>
go
SELECT @@language
SELECT @@version
sp_configure 'default full-text language'
EXEC sp_help_fulltext_catalogs
EXEC sp_help_fulltext_tables
EXEC sp_help_fulltext_columns
EXEC sp_help <your_FT-enable_table_name_here>
SELECT FULLTEXTSERVICEPROPERTY('ResourceUsage')
go

Additionally, where is your FT Catalog located as by default it is created
under the \FTDATA directory where you have SQL Server installed as well as
where your FT-enabled database files (*.mdf, *.ndf, * *.ldf) are located
relative to the location of your FT Catalog folder? If your 'ResourceUsage
is set to 3, you should increase it to 5 (dedicated) via sp_fulltext_service
'resource_usage' <value>, where <value> is 5.

Thanks,
John





[quoted text, click to view]

Hilary Cotter
10/22/2004 12:56:31 PM
No, this is highly abnormal. Performance should be sub second. However I
need to know what your queries look like, ie how many search arguments, how
many rows are returned, and what language you are querying in.


[quoted text, click to view]

Ali Salem
10/22/2004 11:53:02 PM

Thanks.
- We are using enlgihs language.
- Returned number of rows should be very large (thousands, in some cases it
can be 100, 000). this from the Full-Text Seach, other SQL filters will be
applied as well, but from mssearch this is what will be returned. Please note
that the table size is around 3million records.
- The smiplest query such as SELECT * FROM TABLE1 WHERE CONTAINS(COL1,
'ABCD') need around 30 seconds complete.

I do appreciate any help.

thank you

[quoted text, click to view]
Ali Salem
10/23/2004 1:43:01 AM
Below are the result of the script you requested me to run it:

My FT files are stored on SAN storage. So IO should be performing well.
Regarding the ResourceUsage, it is 3. Will not raising it to 5 harm the sql
server performance! I dont want to make the full text search fast by slowing
down SQL Server itself.

I have splitted the result into two posts so that I can post it here.

Thank you for you help




----------------------------------------------------------------------------------------------------------------------------

----
us_english

(1 row(s) affected)








----------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------

--------
Microsoft SQL Server 2000 - 8.00.534 (Intel X86)
Nov 19 2001 13:23:50
Copyright (c) 1988-2000 Microsoft Corporation


Enterprise Edition on Windows NT 5.0 (Build 2195: Service Pack 3)


(1 row(s) affected)

name minimum maximum config_value
run_value
----------------------------------- ----------- ----------- ------------
-----------
default full-text language 0 2147483647 1033 1033

ftcatid NAME


PATH





STATUS NUMBER_FULLTEXT_TABLES
-------
--------------------------------------------------------------------------------------------------------------------

------------
---------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------

--------------------- ----------- ----------------------
5 ICSubject


L:\Microsoft SQL Server\MSSQL\FTDATA





0 1
6 Subject


L:\Microsoft SQL Server\MSSQL\FTDATA





0 1
7 Names


L:\Microsoft SQL Server\MSSQL\FTDATA





0 1

(3 row(s) affected)

TABLE_OWNER


TABLE_NAME


FULLTEXT_KEY_INDEX_NAME


FULLTEXT_KEY_COLID FULLTEXT_INDEX_ACTIVE FULLTEXT_CATALOG_NAME



----------------------------------------------------------------------------------------------------------------------------

----
-----------------------------------------------------------------------------------------------------------------------

---------
------------------------------------------------------------------------------------------------------------------

-------------- ------------------ ---------------------
--------------------------------------------------------------------

------------------------------------------------------------
dbo


IC_SUBJECTS


PK_IC_SUBJECTS


1 1 ICSubject
dbo


MAIN


PK_INOUT_MAIN


1 1 Subject
dbo


PERSONS


PK_PERSONS


1 1 Names

(3 row(s) affected)

TABLE_OWNER


TABLE_ID TABLE_NAME


FULLTEXT_COLUMN_NAME


FULLTEXT_COLID FULLTEXT_BLOBTP_COLNAME
Ali Salem
10/23/2004 1:45:02 AM
The Rest of the script result, one more post is required
Column_name


Type


Computed Length Prec Scale Nullable
TrimTrailingBlanks

FixedLenNullInSource Collation



----------------------------------------------------------------------------------------------------------------------------

----
-----------------------------------------------------------------------------------------------------------------------

--------- ----------------------------------- ----------- ----- -----
----------------------------------- ------------------

----------------- -----------------------------------
----------------------------------------------------------------------

----------------------------------------------------------
DOCID


int


no 4 10 0 no
(n/a)

(n/a) NULL
DESCRIPTION


varchar


no 255 yes
no

no Arabic_CI_AS
DOCDATE


datetime


no 8 yes
(n/a)

(n/a) NULL
RDOCDATE


datetime


no 8 yes
(n/a)

(n/a) NULL
EXT_PARTY_ID


int


no 4 10 0 yes
(n/a)

(n/a) NULL
EXT_PARTY_REP_ID


int


no 4 10 0 yes
(n/a)

(n/a) NULL
SUBCAT_ID


decimal


no 9 18 0 yes
(n/a)

(n/a) NULL
STATUS_ID


int


no 4 10 0 yes
(n/a)

(n/a) NULL
CONFID_ID


int


no 4 10 0 yes
(n/a)

(n/a) NULL
REFERENCE_NO


varchar


no 50 yes
no

no Arabic_CI_AS
FORWARD_TO


decimal


no 9 18 0 yes
(n/a)

(n/a) NULL
FORWARD_DATE


datetime


no 8 yes
(n/a)

(n/a) NULL
TO_REMIND


decimal


no 9 18 0 yes
Ali Salem
10/23/2004 1:45:03 AM
Last post of the script result.




index_name


index_description



index_keys






----------------------------------------------------------------------------------------------------------------------------

----
-----------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- --------------------------------

----------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------
IX_Index1


nonclustered located on SECONDARY2



DOCID, STATUS_ID, CABID, YEAR,

CONFID_ID, DELIVERY_PLACE, PRIORITY_NO,
IX_Index2


nonclustered located on SECONDARY4



INOUTDOCNO, CABID, DOC_ORIGN_SOURCE,

YEAR, DOCDATE, OUTBOUND_REQUEST
IX_Index3


nonclustered located on SECONDARY2



GROUPID
IX_Index4


nonclustered located on SECONDARY4



YEAR, EXT_DOCNO, RDOCDATE_HJ,

EXT_PARTY_ID, DIWAN_REP_ID, EXT_PARTY_REP_ID
IX_Index5


nonclustered located on SECONDARY2



REQUEST_NO
IX_Index6


nonclustered located on SECONDARY4



STATUS_ID, CABID
PK_MAIN


clustered, unique, primary key located on SECONDARY2



DOCNO


constraint_type


constraint_name


delete_action update_action status_enabled
status_for_replication constraint_keys






----------------------------------------------------------------------------------------------------------------------------

----------------------
-----------------------------------------------------------------------------------------------------

--------------------------- ------------- ------------- --------------
---------------------- ------------------------------

----------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------
DEFAULT on column OUTBOUND_REQUEST


DF_MAIN_OUTBOUND_REQUEST


(n/a) (n/a) (n/a) (n/a)
(0)
FOREIGN KEY


FK_MAIN_EMPLOYEES


No Action No Action Enabled
Not_For_Replication USER_ID







REFERENCES dbo.EMPLOYEES (EMP_NO)
FOREIGN KEY


FK_MAIN_EMPLOYEES1


No Action No Action Enabled
Not_For_Replication WRITER_ID






John Kane
10/23/2004 9:47:55 AM
Ali,
First of all, thank you for providing this info as it is most helpful! You
are using SQL Server 2000 SP2 on Windows 2000 Server SP3 and your default
language is US_English (1033). However, all of your char, varchar and
nvarchar columns using the Arabic_CI_AS collation with the FT-enabled column
"Language for Word Breaker" set to Neutral. Can I assume that these column
contain Arabic text?

Yes, I can understand why you want the Resource_Usage level to remain at 3,
but you might try setting to 5, as you can always set it lower if it is
affecting your other SQL Server processing. Although, this bump in the
MSSearch resource usage might not be significant. Note, it affect how much
memory MSSearch can use up to a max if 512MB RAM, if available as well as
the number of concurrent connections allowed.

What is the drive letter for your SAN storage? Could you also run the
following SQL query and post it's results?

sp_helpdb <full_text_enabled_database_name>

I'm not sure if the use of Arabic text is causing the poor query performance
(even with the Neutral wordbreaker), but language is a consideration with
FTS query performance issues.

Thanks again,
John







[quoted text, click to view]
Ali Salem
10/24/2004 1:33:02 AM

Thanks a lot for you prompt answers. Below is the script result you
requested. The database is on drive E on the production, the fulll text
catalogs are on drive L, which is also on the SAN.

Please finde below my comments on your points:
1) The Arabic language columns, Yes, these columns contains arabic text.
However, we are using neutral for FT. Arabic word breaker can follow the same
rules as english one, and I am not sure this might cause i problem in
performance, I have some doubts about this. the only think that i think is
important here is that arabic uses unicode here, so text size is twice! Also,
I am not getting arabic seach slower than english in a reasonable way, I can
understand a loss of 5-10% in performance for such a thing but not to get a
system that is 50 times slower for the simplest query that does not have much
linguistic as this is not being apllie don arabic by sql FT. The query that
should perform in sub second, is taking 30 or 40 seconds this is too much to
the point that I cannot understand it.

2) The drive letter that is used by the database is E, The FT catalgo are on
another SAN Partition and that is frive E. I think this should make
performance even better.


3) Resource Usage, I can try to set to level 5, but how much difference this
will show, I mean can it be the reason for such a poor query performance! or
it should improve the performance by 10% or so?

Also, I have read something about configuring SQL to use a fixed memory on
the server and leave at max 512 MB for mssearch, does setting the resource
usage flag to 5 substitutes the fixed sql server memory idea? or it is
something that i need to do in adition to that?

Thanks a lot for you help and support. i really appreciate it.

thank you,
Ali Salem
name
db_size owner

dbid created status







compatibility_level
--------------------------------------------------------------------------------------------------------------------------------
-------------
--------------------------------------------------------------------------------------------------------------------------------
------ -----------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------
testdb
148399.88 MB
tstdmn\sysadmin
19 Oct 14 2004
Status=ONLINE, Updateability=READ_WRITE, UserAccess=MULTI_USER,
Recovery=BULK_LOGGED, Version=539, Collation=Arabic_CI_AS, SQLSortOrder=0,
IsTornPageDetectionEnabled, IsAutoCreateStatistics, IsAutoUpdateStatistics,
IsFullTextEnabled



80


name
fileid filename



filegroup
size
maxsize growth usage
--------------------------------------------------------------------------------------------------------------------------------
------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------
------------------ ------------------ ------------------ ---------
testdbSYS
1
E:\E_DataBase\testdbSYS_Data.MDF


PRIMARY

10112 KB Unlimited 10% data only
testdb_Log
2
E:\E_DataBase\testdb_Log.LDF


NULL

Ali Salem
10/24/2004 7:01:07 AM

Hello, Do you have any comment to add to my problwm
thank you

[quoted text, click to view]
John Kane
10/24/2004 9:56:25 AM
You're welcome, Ali Salem,
Without actually testing English vs. Arabic & using the Neutral wordbreaker
(for both cases) with a 1.6+ million row table and comparing the FTS query
performance, I would tend to agree that Arabic text alone *should* not by
itself account for this poor query performance. However, I would tend to
think that this poor performance is related to the size of the table as it
is over 1 million rows where the both the FTI and FTS performance starts to
bog down. See SQL Server 2000 BOL title "SQL Full-text Search
Recommendations" for more information on this.

Your FT Catalogs are actually located on the L drive (L:\Microsoft SQL
Server\MSSQL\FTDATA\) the same drive that SQL Server system database and
tempdb are located also increasing the Resource Usage to 5, most likely will
only give you a 5% boost, but still it is worth a try and given your
machine's memory configuration (8 GB of memory) you should be able to ensure
that SQL Server has enough memory (fixed memory) and allow the MSSearch to
use at least 512Mb (max) of available RAM. What you read is correct, and no,
setting the resource_usage to 5 does not substitute for setting SQL Server
memory to fixed.

For large FT Catalogs, I generally recommend using the Top_N_Rank parameter
to limit the resultset, but you stated that you had "other parameters in
[your] query that can affect the returned result set and that are outside
the FTS. What are these other parameters? Are they other columns in this
FT-enable table or are they joins with other tables? If so, can you test
using CONTAINSTABLE and use the Top_N_Rank value set to 2000 (but not more
that this "special" number)? Depending upon your answers there may be other
alternative methods to add the "other parameters" to the RANK value and
still get better performance, but you most likely will have to use
CONTAINSTABLE and the Top_N_Rank value to get better FTS performance over
CONTAINS with FT-enabled tables of greater than 1 million rows.

Regards,
John



[quoted text, click to view]
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
---------------------------------------------------------------------- -----
--------------
[quoted text, click to view]
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------
[quoted text, click to view]
Ali Salem
10/25/2004 1:31:03 AM
Thanks again and alot

My parameters are a mix of other columns in the same table and form other
tables.
So such a solution might not be so possible although I know a queries can be
a lot faster.

do you think it will help If I chaned the FT files to another drive other
than L, also L is a SAN drive?

What I really need to know, is this a normal query performance of such a
table size, i mean should the query takes more than 30 seconds to complete,
even when the server resources usage is not high! I mean is this is the
normal performance for a table with multi million (also they are just 3 now)
rows tables to take such a period to return. Because all the available
suggestions are way to increase performance slightly, i mean all this talk
about resource usage, memory and drives...etc. As the query is slow even when
the server is almost idle.

The only thing that I think might be reasonable is that Top_n_rank, but this
is a work around the real problem, which is FT search is slow for large
table, very slow. Am I correct in this, or there is something hidden in my
case.

Thank you very much for you time and effort.

Ali

[quoted text, click to view]
Hilary Cotter
10/25/2004 7:23:44 AM
your problem is with the number of rows you are returning. There is little
you can do to help with this.

--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html


[quoted text, click to view]

John Kane
10/25/2004 10:05:36 AM
You're welcome, Ali,
Due to the size of the FT-enable table, using CONTAINSTABLE (or
FREETEXTTABLE) with Top_N_Rank (set to 2000) most likely is your best
solution, unless you can obtain and upgrade to SQL Server 2005 (currently in
Beta2 release for MSDN subscribers) as the new version of SQL Server has
much improved the performance & scalability of FT Indexing and FT Search.

It might help to change the FT Catalog folder and files to another drive as
it depends upon how the L:\ drive is formatted (RAID5 or RAID10 ?) and move
the FT Catalog folder and files to another drive (Raid0 or Raid10, if
possible) that does not contain your database files via the procedures in KB
article Q40867 (Q240867) "INF: How to Move, Copy, and Backup Full-Text
Catalog Folders and Files" at
http://support.microsoft.com/default.aspx?scid=kb;EN-US;240867

No, this is not normal query for a table of this size, but the number of
rows returned without using containstable and top_n_rank may be a factor as
well. Yes, you are correct that Top_N_Rank is a workaround for large FT
enable tables, see KB articles 240833 (Q240833) "FIX: Full-Text Search
Performance Improved via Support for TOP" at
http://support.microsoft.com//default.aspx?scid=kb;EN-US;240833 and 303459
(Q303459) "INF: How to Improve the Performance of FTS Queries for Large
Tables" at http://support.microsoft.com/?id=303459 as this too might be
helpful for you.

FTS performance tuning with SQL Server 2000 is more of an "art" than a
science and takes a careful balance of hardware and software configurations
along with SQL Sever and OS considerations as well as an understanding of
the text to be searched.

Regards,
John


[quoted text, click to view]
Hilary Cotter
10/25/2004 11:05:00 AM
Try setting sp_fulltext_service 'resource_usage' 5

This primarily affects indexing.

You might want to consider partitioning. Partitioning is where you break up
the table you are searching into multiple child tables. If your query
involves a data range you can have a table for each range, ie a year, or
monthly tables. Then have some conditional statements which redirects the
query to the pertinent table.
This helps when you are doing queries like this


select * from tableName where contains(*,'test') and datecol > 2004-01-01



[quoted text, click to view]

AddThis Social Bookmark Button