lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From esra <esraer...@gmail.com>
Subject lucene farsi problem
Date Wed, 30 Apr 2008 07:21:52 GMT

hi,

i am using lucene's "IndexSearcher" to search the given xml by keyword which
contains farsi information.
while searching i use ranges like 

آ-ث  |  ج-خ  |  د-ژ  |  س-ظ  |  ع-ق  |  ک-ل  |  م-ی

when i do search for  "د-ژ"  range the results are wrong , they are the
results of  " س-ظ "range.

for example when i do search for "د-ژ"  one of the the results is "ساب ووفر"
, this result also shown on the " س-ظ " range's result list which is the
corret range. 

As IndexSearcher use "compareTo" method and this method uses unicodes for
comparing, i found the unicodes of the characters.

د=U+62F 
ژ = U+698 
and the first letter of "ساب ووفر " is  س = U+633

Do you have any idea how to solve this problem, there are analyzers for
different languages ,
will this be usefull if so do you know where to find a farsi analyzer?

I would bu glad if you help.

thanks ,

Esra

-- 
View this message in context: http://www.nabble.com/lucene-farsi-problem-tp16977096p16977096.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message