Daniel,
You should know that this is a 'non-trival' effort... However, if you want
to do this the best place to start is the Indexing Service samples provide
in the Windows Platform SDK. You can download this if you have a MSDN
Subscription or review the MSDN documentation online at
Extending Language Resources for Indexing Service
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/wbrscenario_4ckl.asp?frame=true
Word Breaker and Stemmer Sample
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/wbrscenario_3e91.asp
Using Custom Filters with Indexing Service
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixufilt_912d.asp
Once you have a working Portuguese wordbreaker and stemmer working, then
it's just a matter of adding new Registry keys and values for the Portuguese
language as well as a new Portuguese noise word file and linking this to SQL
Server Full-text Search. Note, this has been successfully done (as a
research project) for the Greek language that is also not supported by SQL
Server 2000 FT Indexing.
Regards,
John
[quoted text, click to view] "Daniel Marreco" <dmarreco@hotmail.com> wrote in message
news:609601c474f5$713a9250$a601280a@phx.gbl...
Hi all,
I know portuguese is not supported by FTS today.
So i´m trying to 'develop' it by my own. First thing i
did was to re-write the stop words listed in noise.dat
with noise words in portuguese.
So far, so good. Now, where the hell can i find the rules
used by the stemming engine? Can i change them and define
my own set of rules for stemming in portuguese? I already
have a nice set of rules, that i use with my lucene
projects.
thanks a lot