lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "OBender Hotmail" <osya_ben...@hotmail.com>
Subject RE: Hindi, diacritics and search results
Date Sat, 11 Jul 2009 01:13:26 GMT
I'm using default analyzer. Actually one that is set by default by Compass framework but I
assume it is the same that would be used in Lucene by default.
Which one should I use?

-----Original Message-----
From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Friday, July 10, 2009 6:13 PM
To: java-user@lucene.apache.org
Subject: Re: Hindi, diacritics and search results

Which analyzer in particular are you using?

Its probably not doing what you want for hindi. These "diacritics" are
important (vowels, etc).


On Fri, Jul 10, 2009 at 3:10 PM, OBender<osya_bender@hotmail.com> wrote:
> Hi All,
>
>
>
> I'm using the default setup of lucene (no custom analyzers configured) and
> came across the following issue:
>
> In Hindi if there is a letter with a diacritic in a phrase lucene will find
> the phrase with this letter even if the search string is for the letter
> without a diacritics.
>
> Is this an expected behavior? Maybe this is standard for all languages with
> letters that have diacritics?
>
>
>
> From pure byte standpoint I can see the logic, the letter with diacritics
> takes 6 bytes (E0 A4 95 E0 A5 87) and the single letter takes  3 (E0 A4 95)
> so if I search for *some_letter* where some letter has code (E0 A4 95)
> lucene finds the "phrase" (E0 A4 95 E0 A5 87) that includes that letter.
>
>
>
> Any comments much appreciated.
>
>
>
> Thanks.
>
>
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Checked by AVG - www.avg.com 
Version: 8.5.375 / Virus Database: 270.13.0/2209 - Release Date: 07/10/09 17:57:00


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message