lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: lucene farsi problem
Date Thu, 01 May 2008 12:42:14 GMT
Hi Esra,

Going back to the original problem statement, I see something that looks illogical to me -
please correct me if I'm wrong:

On Apr 30, 2008, at 3:21 AM, esra wrote:
> i am using lucene's "IndexSearcher" to search the given xml by
> keyword which contains farsi information.
> while searching i use ranges like
> 
> آ-ث  |  ج-خ  |  د-ژ  |  س-ظ  |  ع-ق  |  ک-ل  |  م-ی
> 
> when i do search for  "د-ژ"  range the results are wrong , they
> are the results of  " س-ظ "range.
> 
> for example when i do search for "د-ژ"  one of the the results is
> "ساب ووفر", this result also shown on the " س-ظ " range's result
> list which is the corret range.
> 
> As IndexSearcher use "compareTo" method and this method uses
> unicodes for comparing, i found the unicodes of the characters.
> 
> د=U+62F
> ژ = U+698
> and the first letter of "ساب ووفر " is  س = U+633

It appears to me that *both* the "د-ژ" range [ U+062F - U+0698 ] and the "س-ظ" range [
U+0633 - U+0638 ] contain the first letter of "ساب ووفر", which is "س" = U+0633. 


You stated that U+0633 should be contained in the [ U+0633 - U+0638 ] range - I agree - but
why do you think U+0633 should not be contained in the [ U+062F - U+0698 ] range?

In other words, it looks to me like your problem is not a problem at all.

Steve
Mime
View raw message