Groups | Blog | Home
all groups > sql server full text search > february 2004 >

sql server full text search : Searching for composite words in Swedish


John
2/23/2004 5:20:21 PM
Hi,

In Swedish it is _very_ common with so called "composite words". Example:
"Cat" is "katt" in Swedish. "Food" is "mat" in Swedish. Thus, "Cat food" is
"kattmat" in Swedish.

Now to the question: Is there a way to have Fulltext Indexing to recognise
composite words so that when I search for fields containing "mat" I _will_
also see the fields containing "kattmat"?

WHERE CONTAINS ( article, '"*mat*"' )

doesn't work is seems since FTS disregards the leading *.

Best Regards
John

Hilary Cotter
2/24/2004 8:12:08 AM
no, what you have to do is trap for the word in your query and then expand
them to all possible word forms.

IE kattmatt will be expanded to (katt matt) OR (kattmatt)

[quoted text, click to view]

jt-kane NO[at]SPAM comcast.net
2/25/2004 2:56:07 PM
John,
You should also confirm that for your FT-enable column "article" that
the "Language for Word Breaker" is set to Swedish_Default (or LCID
1053). You can confirm this via sp_help_fulltext_columns and it's
column FULLTEXT_LANGUAGE for the column "article". If this is not set
to the correct language (Swedish_Default) this should get you better
results.

Regards,
John


[quoted text, click to view]
John Kane
3/1/2004 11:17:08 PM
You're welcome, John,
Could you provide the version of SQL Server and the OS platform you are
using?
Specifically, could you post the full output of: SELECT @@version

This is less an issue of altering the"dictionary" and more an issue with the
Swedish OS-supplied wordbreaker dll. If you are using Win2K vs. Win2003, you
may get different, i.e., more expectant results for your composite Swedish
words.

Thanks,
John



[quoted text, click to view]

John
3/2/2004 7:44:09 AM
Hi and thanks for the tips,

I indexed it with Swedish and then when I search for "stift", I correctly
get "ritstift", "rundstift" etc. However I don't get any match on
"tändstift" and other composite words.

Is there any way to alter the dictionary for the fulltext indexer so that it
correctly indexes "tändstift" as a composition of "tänd" and "stift"?

Regards,
John

[quoted text, click to view]

AddThis Social Bookmark Button