Groups | Blog | Home
all groups > sql server full text search > november 2003 >

sql server full text search : Re: Some code to test stemmer output if you are interested


John Kane
11/20/2003 6:26:01 AM
re-posting as I'm not sure if this got posted the first time around...
John


Tony,
If it still using the US_English version, perhaps it's because your server's
'default full-text language' is set to 1033 (US_English) as well? What does
the following SQL code return?

use master
go
sp_configure 'default full-text language'
go

If it returns 1033 as the run_value, could you change it to 2057
(UK_English)? This might be the reason why it's still using the US English
version. You can make the change via the following SQL code:

use master
go
sp_configure 'default full-text language', 2057
go
reconfigure with override
go

I don't believe you need to stop & restart SQL Server for this change to be
come effective, and if you have not rebooted the server after adding the
wbcache files, you might want to schedule a reboot. Once you've made the
above change and assuming the original run_value was 1033, then run a Full
Population and re-test your query and Andrew's C# applet.

Regards,
John



[quoted text, click to view]

tonyz.wrightz NO[at]SPAM consultant.com
11/20/2003 5:06:58 PM
OK, it is now (finally) working.

Basically, I need to set the default full-text language to UK english,
then reconfigure, then reboot (the reboot is necessary).

After rebooting, the uk english stemming comes to life, without even
rebuilding the catalog. The queries work as required - tyre stems to
tyres etc.

After this, I changed the default fulltext language back to us
english, reconfigured, and rebooted. The queries still worked for uk
english.

This leads me to believe that setting the default full-text language
is important so that the sql server can install whatever it needs to
support the desired language. After this, it doesn't appear to be
important to leave this default on uk english.

Incidently, the wbdbase and wbcache files are not installed on two of
the machines that have working queries.

Thank you John and Andrew for your help with this. I have learnt a lot
Andrew Cencini [MSFT]
11/20/2003 6:52:59 PM
Well whatever it was Tony, I am glad to hear it is working for you. Sorry
for the inconvenience.

I am on my way out of the country for a week-long holiday so I am glad the
issue is resolved before I left.

All the best,
--andrew

Andrew Cencini
Program Manager
Microsoft Corp. - SQL Server Engine

--
This posting is provided "AS IS" with no warranties, and confers no rights.

[quoted text, click to view]

John Kane
11/20/2003 8:46:55 PM
You're welcome, Tony!
Just to summarize as one or two of my and Andrew's posting/replies never
were posted to the fulltext newsgroup... If I may, I'd like to identify the
symptoms and summarize what happened, for future references:

1) The "Language for Word Breaker" for your FT-enabled table column was set
to one language (UK_English) and this LCID also happens to be a sub-language
for English and the stemmer was incorrectly stemming some English words
(Tyre, tyres, etc.), and seeming to use the US_English wordbreaker, while it
should of been using the UK_English wordbreaker. Also, the SQL FTS
formsof(Inflectional) queries were returning un-expected results for some
English words.

2) Additionally, the problem server's @@language was set to US_English and
more importantly it's sp_configure's 'default full-text language' was also
set to US_English or 1033 and I'm assuming that you modified this to 2057
(UK_English) and then rebooted the server, but not before copying over the
missing wbcache.eng and wbdbase.eng files. Correct?

3) While we also identified that your problem server did not have all of the
wbcache.* and wbdbase.* files as there should be eight (8) of each of these
files. Specifically, the problem server was missing wbcache.eng and
wbdbase.eng files and these files are the UK_English wordbreker files. You
then copied these two files from your SQL Server 2000 service pack
installation, and initially did not reboot the problem server. Correct?
(See KB article 296136 as a reference for this issue.)

4) Later you rebooted the problem server (with the above changes completed)
and this server's SQL FTS queries are "now (finally) working". This
indicates that a reboot of the problem server was also required.
Additionally, on two other "non-problem" server's the wbdbase.eng and
wbcache.eng files were missing, however, the stemming of English words are
working correctly and the FTS queries were returning the expected results.
Correct? What is the current sp_configure 'default full-text language'
run_value for these non-problem servers?

Finally, you never did get the syntax error "Language database/cache file
could not be found" for any of the SQL FTS queries, and with the
sp_configure's 'default full-text language' set to US_English on the problem
server changed to UK_English along with these files now on the problem
server, AND then rebooting the problem server - ALL of these factors
*might* been a contributing factor in correcting the stemming of English
words on the problem server.

Tony, would you agreed with this summarization?
Regards,
John



[quoted text, click to view]

tonyz.wrightz NO[at]SPAM consultant.com
11/23/2003 4:18:40 PM
Oh my, this is long... Comments in-line

[quoted text, click to view]

Agree with all but the last statement. The stemming appears to work
correctly within the context of the current language. That is, tyre/s
stems correctly in UK English, but not is US English because of the
different spelling. Unreconised words appear to be returned by
themselves with no attempts at stemming.

[quoted text, click to view]

This was done much, much later. In fact this was really the last thing
I did that made everything work. I had two other test servers
configured identically that exhibited the same problem, that I was
able to test changes on, so I was then able to isolate the change that
had the most impact.

As it turned out, on the first server I copied over the wbcache.eng
and wbdbase.eng files, but it really didn't make much difference.

Once I changed the default full-text language to UK english and then
rebooted, the full-text queries worked. To determine whether it was
the wbdbase/wbcache files, or the setting of the default full-text
language to UK Eng, I went to one of my other test machines, which had
not been altered.

On the two other test machines, all I did was change the default
full-text language, not copy across the wbdbase and wbcache files,
which, incidently, do not appear to be there now - yet the full-text
queries return the correct result. Er, that is to say, I was only
really monitoring the wbdbase and wbcache files in WINNT/System32, so
if it installed them elsewhere then I haven't kept my eye on that.

[quoted text, click to view]

Correct, I didn't reboot the server, so I can't confirm whether this
would have solved the problem on its own.

[quoted text, click to view]

Rebooting was definitely required. I was able to check this on my test
machines. The uk full-text queries were not working on the two
"non-problem" servers either. That is why I can be pretty definite
that the solution was to set the default full-text language to UK
english. That's not to say you can't change it back to US afterwords
then reboot. Which suggests that something is configured when the
default language is originally set to UK english, that is not
configured just be setting the word breaker language on the full-text
index.

[quoted text, click to view]

That's correct - no error messages were given. I did not have to copy
the wbdbase and wbcache files manually onto my other two problem
servers. Setting the default full-text language, then rebooting, did
everything I needed to get this operational for UK english on those
servers.

Finally, I should say that life would have been much easier if it were
possible to make a english language neutral stemmer that can handle
both US and UK english...but I don't know about the language
structure, so maybe that's not possible.

John Kane
11/23/2003 6:15:49 PM
Thank you, Tony for taking the time to reply to this long post! <G>
To summarize (if I can), the primary change that allowed FTS
formsof(inflectional) queries on "tyre" to correctly return "tyre", "tyres",
etc. was my recommendation to change the default full-text language to
UK_English (2057) and then rebooting the server, the latter is required as
well. The missing wbdbase.eng & wbcache.eng files or had no affect.

I also find this statement important as well - "Which suggests that
something is configured when the default language is originally set to UK
english, that is not configured just be setting the word breaker language on
the full-text index". Whether or not this can be repo'ed and filed as a bug
remains to be tested...

I don't know if a "english language neutral stemmer that can handle both US
and UK english" is possible, but perhaps in a future version of SQL Server
or in a future OS platform... Time will tell on this...

Thanks again!
John



[quoted text, click to view]

AddThis Social Bookmark Button