Groups | Blog | Home
all groups > sql server full text search > march 2005 >

sql server full text search : Indexing Service problem: Query returned only ignored words


dotNetCoder
3/12/2005 12:17:42 AM
Hello.
I'm trying to create a search engine for my Web Site using Indexing
Service. The content is stored in text files in UTF8 Encoding (arabic
text).
The search utility works well on my local server but it throws an
exception online: the query return only ignored words.
Every file contains the following statement in its header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">.
I will be thankful for your help .



*** Sent via Developersdex http://www.developersdex.com ***
John Kane
3/14/2005 8:33:38 AM
dotNetCoder,
As you're using UTF8 encoding with Arabic text, you might want to checkout
the new
"Microsoft Arabic Word-Breaker (Arabic Search Engine) - Beta" at
http://www.microsoft.com/middleeast/arabicdev/beta/search/ and download the
Installation Guide and the Microsoft Arabic Word-Breaker. Under the
"Installation Requirements for SQL Server 2000" section, you will find that
the Arabic Word-Breaker is "...the Full-Text Search service is enabled for
running queries using the new Arabic Word-Breaker."

Please, download it and let this newsgroup know if you find it effective in
resolving your FTS issues with Arabic HTML files, although, you most likely
will have to include the meta name="MS.Locale" content="AR">" (Arabic) html
tags.

You might find the following material useful - In addition to the BOL
documentation, there is now on MSDN - "Arabic
Language Support in Microsoft SQL Server 2000" at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_arabicsupport.asp
that might also be helpful to you.

Additionally, and assuming that the HTML documents that you are storing in
your IMAGE column PageText are in Arabic, could you confirm that all of the
HTML files have the correct language identifiers? Specifically, confirm if
there are any <meta> tags within the <header> tags and that they are
"<head><meta name='ms.locale' content='EN-US'></head>" vs. "<meta
name="MS.Locale" content="AR">" (the latter is Arabic). For more info see
the "HTML Filter" at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixufilt_2uuq.asp
as well as http://www.otal.umd.edu/uupractice/non_english/

Thanks,
John
--
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/



[quoted text, click to view]

Hilary Cotter
3/14/2005 10:07:43 AM
at the top of your asp/aspx code page you have to use session.codepage= and
set it equal to the LCID for the particular form of Arabic you are using.

--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

[quoted text, click to view]

dotNetCoder
3/17/2005 3:05:16 AM
Hilary and Jone, I want to thank you for your help.
Actually till now I couldn't resolve the problem.
I think the problem is in the ASP.NET Framework, because in the past I
have created a Search Engine for a classic ASP website and I didn't face
that problem.
The problem is that if I search for an english text (ASCII characters),
the search returns the correct results but if I try any arabic string, I
get the exception "Query returned only ignored words".
The difference between classic ASP page and ASP.NET page is that my .Net
Web pages are UTF8 encoded (I have to do this because my server default
language for non unicode text isn't arabic and obviously I can't ask my
host to set it to arabic). So I think that the problem is in the UTF8
encoding of my pages.

Jone: for the arabic Wordbreaker, I have tried it before and I found it
efficient in my arabic search but it worked only when the default option
in my server "Regional Options" was set to Arabic (otherwise, I noticed
that the neutral language is used not the arabic language).


*** Sent via Developersdex http://www.developersdex.com ***
AddThis Social Bookmark Button