lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From esra <esraer...@gmail.com>
Subject RE: lucene farsi problem
Date Fri, 02 May 2008 13:31:12 GMT

Hi Steven,

sorry i made a mistake. unicodes are like this:

> د=U+62F
> ژ = U+632
> and the first letter of "ساب ووفر " is  س = U+633

you can also check them here
:http://www.unics.uni-hannover.de/nhtcapri/persian-alphabet.html

Esra


Steven A Rowe wrote:
> 
> Hi Esra,
> 
> Going back to the original problem statement, I see something that looks
> illogical to me - please correct me if I'm wrong:
> 
> On Apr 30, 2008, at 3:21 AM, esra wrote:
>> i am using lucene's "IndexSearcher" to search the given xml by
>> keyword which contains farsi information.
>> while searching i use ranges like
>> 
>> آ-ث  |  ج-خ  |  د-ژ  |  س-ظ  |  ع-ق  |  ک-ل  |  م-ی
>> 
>> when i do search for  "د-ژ"  range the results are wrong , they
>> are the results of  " س-ظ "range.
>> 
>> for example when i do search for "د-ژ"  one of the the results is
>> "ساب ووفر", this result also shown on the " س-ظ " range's result
>> list which is the corret range.
>> 
>> As IndexSearcher use "compareTo" method and this method uses
>> unicodes for comparing, i found the unicodes of the characters.
>> 
>> د=U+62F
>> ژ = U+698
>> and the first letter of "ساب ووفر " is  س = U+633
> 
> It appears to me that *both* the "د-ژ" range [ U+062F - U+0698 ] and the
> "س-ظ" range [ U+0633 - U+0638 ] contain the first letter of "ساب ووفر",
> which is "س" = U+0633.  
> 
> You stated that U+0633 should be contained in the [ U+0633 - U+0638 ]
> range - I agree - but why do you think U+0633 should not be contained in
> the [ U+062F - U+0698 ] range?
> 
> In other words, it looks to me like your problem is not a problem at all.
> 
> Steve
> 
> 

-- 
View this message in context: http://www.nabble.com/lucene-farsi-problem-tp16977096p17019498.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message