lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From esra <esraer...@gmail.com>
Subject RE: lucene farsi problem
Date Fri, 02 May 2008 16:18:45 GMT

Hi Steven ,

yes the correct one is "ژ "/"ze"/U+632.

my problem is when i do search for  "  د-ژ" range. The result is  ""ساب ووفر 
" and this word's first letter is "س " and it's unicode is "U+633"  and  it
is not in the in the [ U+062F - U+0632 ] range.

am i wrong?

Esra


Steven A Rowe wrote:
> 
> Hi Esra,
> 
> I still think you're wrong :).
> 
> On 05/02/2008 at 9:31 AM, esra wrote:
>> > ژ = U+632
> 
> According to the website you linked to, the above character, which has
> three dots over it, is named "zhe", and its Unicode code point is U+698. 
> (I had to increase the font size to see the three dots.)
> 
> I think you are confusing "ژ"/"zhe"/U+698 with the letter "ز"/"ze"/U+632,
> which has just one dot over it.
> 
> Unless you were mistaken in all of your emails when you included the
> character "ژ"/"zhe" instead of "ز"/"ze", then what I said in my previous
> email still stands: there is no problem here.
> 
> Steve
> 
> On 05/02/2008 at 9:31 AM, esra wrote:
>> 
>> Hi Steven,
>> 
>> sorry i made a mistake. unicodes are like this:
>> 
>> > د=U+62F
>> > ژ = U+632
>> > and the first letter of "ساب ووفر " is  س = U+633
>> 
>> you can also check them here
>> > http://www.unics.uni-hannover.de/nhtcapri/persian-alphabet.html
>> 
>> Esra
>> 
>> 
>> Steven A Rowe wrote:
>> > 
>> > Hi Esra,
>> > 
>> > Going back to the original problem statement, I see something that
>> > looks illogical to me - please correct me if I'm wrong:
>> > 
>> > On Apr 30, 2008, at 3:21 AM, esra wrote:
>> > > i am using lucene's "IndexSearcher" to search the given xml by
>> > > keyword which contains farsi information.
>> > > while searching i use ranges like
>> > > 
>> > > آ-ث  |  ج-خ  |  د-ژ  |  س-ظ  |  ع-ق  |  ک-ل  |  م-ی
>> > > 
>> > > when i do search for  "د-ژ"  range the results are wrong , they
>> > > are the results of  " س-ظ "range.
>> > > 
>> > > for example when i do search for "د-ژ"  one of the the results is
>> > > "ساب ووفر", this result also shown on the " س-ظ " range's result
>> > > list which is the corret range.
>> > > 
>> > > As IndexSearcher use "compareTo" method and this method uses
>> > > unicodes for comparing, i found the unicodes of the characters.
>> > > 
>> > > د=U+62F
>> > > ژ = U+698
>> > > and the first letter of "ساب ووفر " is  س = U+633
>> > 
>> > It appears to me that *both* the "د-ژ" range [ U+062F - U+0698 ] and
>> > the "س-ظ" range [ U+0633 - U+0638 ] contain the first letter of "ساب
>> > ووفر", which is "س" = U+0633.
>> > 
>> > You stated that U+0633 should be contained in the [ U+0633 - U+0638 ]
>> > range - I agree - but why do you think U+0633 should not be contained
>> > in the [ U+062F - U+0698 ] range?
>> > 
>> > In other words, it looks to me like your problem is not a problem at
>> > all.
>> > 
>> > Steve
>> > 
>> > 
>> 
>> -- View this message in context:
>> http://www.nabble.com/lucene-farsi-problem-tp16977096p17019498.html Sent
>> from the Lucene - Java Users mailing list archive at Nabble.com.
>> 
>> 
>> --------------------------------------------------------------------- To
>> unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For
>> additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>>
> 
>  
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/lucene-farsi-problem-tp16977096p17022861.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message