all groups > sql server full text search > july 2004 >
You're in the

sql server full text search

group:

Stemming in Portuguese



Stemming in Portuguese Daniel Marreco
7/28/2004 3:51:38 PM
sql server full text search: Hi all,

I know portuguese is not supported by FTS today.

So i=B4m trying to 'develop' it by my own. First thing i=20
did was to re-write the stop words listed in noise.dat=20
with noise words in portuguese.

So far, so good. Now, where the hell can i find the rules=20
used by the stemming engine? Can i change them and define=20
my own set of rules for stemming in portuguese? I already=20
have a nice set of rules, that i use with my lucene=20
projects.

Re: Stemming in Portuguese John Kane
7/28/2004 6:55:04 PM
Daniel,
You should know that this is a 'non-trival' effort... However, if you want
to do this the best place to start is the Indexing Service samples provide
in the Windows Platform SDK. You can download this if you have a MSDN
Subscription or review the MSDN documentation online at

Extending Language Resources for Indexing Service
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/wbrscenario_4ckl.asp?frame=true

Word Breaker and Stemmer Sample
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/wbrscenario_3e91.asp

Using Custom Filters with Indexing Service
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixufilt_912d.asp

Once you have a working Portuguese wordbreaker and stemmer working, then
it's just a matter of adding new Registry keys and values for the Portuguese
language as well as a new Portuguese noise word file and linking this to SQL
Server Full-text Search. Note, this has been successfully done (as a
research project) for the Greek language that is also not supported by SQL
Server 2000 FT Indexing.

Regards,
John


[quoted text, click to view]
Hi all,

I know portuguese is not supported by FTS today.

So i´m trying to 'develop' it by my own. First thing i
did was to re-write the stop words listed in noise.dat
with noise words in portuguese.

So far, so good. Now, where the hell can i find the rules
used by the stemming engine? Can i change them and define
my own set of rules for stemming in portuguese? I already
have a nice set of rules, that i use with my lucene
projects.

thanks a lot

Re: Stemming in Portuguese Hilary Cotter
7/28/2004 9:33:33 PM
the rules are not proprieatry. However you can roll your own.

Check out this link for more information.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/wbrscenario_9i0j.asp

--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html


[quoted text, click to view]
Hi all,

I know portuguese is not supported by FTS today.

So i´m trying to 'develop' it by my own. First thing i
did was to re-write the stop words listed in noise.dat
with noise words in portuguese.

So far, so good. Now, where the hell can i find the rules
used by the stemming engine? Can i change them and define
my own set of rules for stemming in portuguese? I already
have a nice set of rules, that i use with my lucene
projects.

thanks a lot

AddThis Social Bookmark Button