Groups | Blog | Home
all groups > sql server full text search > november 2003 >

sql server full text search : Multi Language Recommendations


Peter Sedman
11/25/2003 11:41:42 AM
Hi,

SQL Server 2000, Windows 2K

I'm building a simple document management system. The system needs to be
able to perform full text searches on documents uploaded into the database.

Documents can be uploaded in in any language. I want to take advantage of
the noise word lists provided by SQL Server for each supported language.

Is it best to create a single table to store the documents and have a column
for each language (supported by fts) to store the document content or to
create a table for each language?

Are there any other recommendations for setting up fts with multiple
languages?

Thanks,
Peter

John Kane
11/25/2003 2:44:06 PM
Peter,
Without knowing more about your requirements, and other factors (number of
languages & total number of rows), I'd normally say that it is best to have
a create a single table to store the documents and have a column for each
language (supported by fts) to store the document content. If you mix
languages in a single column, you then must use the Neutral "Language for
Word Breaker" and as this will break the words based upon "white space"
between the words and not the language-specific languages. Also, with the
Neutral "Language for Word Breaker", you lose some SQL FTS functionality,
specifically the use of searching on INFLECTIONAL variations of words.

You can also create a table for each language, but I'd only recommend that
if you had other language-specific data columns that you also wanted to
store and track out side of the language text.

Another consideration is for using a single table to store the documents and
have a column for each language, is that for this table all of the multiple
languages will need to be stored in a single FT Catalogs. However, if you
create a table for each language and depending upon how many languages you
plan on using (say, no more than 10) then you should have no problems with
10 FT Catalogs, depending upon the size of each FT Catalog, i.e., the number
of rows in the FT-enabled tables. See the last paragraph in SQL Server 2000
BOL title "Full-Text Search Recommendations" for more info on this.

Regards,
John




[quoted text, click to view]

AddThis Social Bookmark Button