lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: lucene farsi problem
Date Wed, 30 Apr 2008 16:50:34 GMT
Hi Esra,

Caveat: I don't speak, read, write, or dream in Farsi - I just know that it mostly shares
its orthography with Arabic, and that they are both written and read right-to-left.

How are you constructing the queries?  Using QueryParser?  If so, then I suspect the problem
is that you intend the range you supply to be read entirely right-to-left, but Lucene instead
reads it left-to-right.  Have you tried using e.g. "د-ژ" instead of "د-ژ"?  (That is,
placing the lower valued term on the left instead of the right.)

AFAICT, RangeFilter (called from ConstantScoreRangeQuery, which is called from QueryParser)
does not test whether lowerTerm is in fact lower than upperTerm.  If it turns out that the
problem is simply one of order, it might make sense to modify RangeFilter so that it flip
them when lowerTerm > upperTerm.

Steve

On 04/30/2008 at 3:21 AM, esra wrote:
> 
> hi,
> 
> i am using lucene's "IndexSearcher" to search the given xml
> by keyword which
> contains farsi information.
> while searching i use ranges like
> 
> آ-ث  |  ج-خ  |  د-ژ  |  س-ظ  |  ع-ق  |  ک-ل  |  م-ی
> 
> when i do search for  "د-ژ"  range the results are wrong ,
> they are the
> results of  " س-ظ "range.
> 
> for example when i do search for "د-ژ"  one of the the
> results is "ساب ووفر"
> , this result also shown on the " س-ظ " range's result list
> which is the
> corret range.
> 
> As IndexSearcher use "compareTo" method and this method uses
> unicodes for
> comparing, i found the unicodes of the characters.
> 
> د=U+62F
> ژ = U+698
> and the first letter of "ساب ووفر " is  س = U+633
> 
> Do you have any idea how to solve this problem, there are
> analyzers for
> different languages ,
> will this be usefull if so do you know where to find a farsi analyzer?
> 
> I would bu glad if you help.
> 
> thanks ,
> 
> Esra
> 
> -- View this message in context:
> http://www.nabble.com/lucene-farsi-problem-tp16977096p16977096.html Sent
> from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For
> additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

 

Mime
View raw message