lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Hindi, diacritics and search results
Date Sat, 11 Jul 2009 02:35:22 GMT
there is really no default in lucene

a good start for hindi would be to try WhitespaceAnalyzer.

On Fri, Jul 10, 2009 at 9:13 PM, OBender Hotmail<osya_bender@hotmail.com> wrote:
> I'm using default analyzer. Actually one that is set by default by Compass framework
but I assume it is the same that would be used in Lucene by default.
> Which one should I use?
>
> -----Original Message-----
> From: Robert Muir [mailto:rcmuir@gmail.com]
> Sent: Friday, July 10, 2009 6:13 PM
> To: java-user@lucene.apache.org
> Subject: Re: Hindi, diacritics and search results
>
> Which analyzer in particular are you using?
>
> Its probably not doing what you want for hindi. These "diacritics" are
> important (vowels, etc).
>
>
> On Fri, Jul 10, 2009 at 3:10 PM, OBender<osya_bender@hotmail.com> wrote:
>> Hi All,
>>
>>
>>
>> I'm using the default setup of lucene (no custom analyzers configured) and
>> came across the following issue:
>>
>> In Hindi if there is a letter with a diacritic in a phrase lucene will find
>> the phrase with this letter even if the search string is for the letter
>> without a diacritics.
>>
>> Is this an expected behavior? Maybe this is standard for all languages with
>> letters that have diacritics?
>>
>>
>>
>> From pure byte standpoint I can see the logic, the letter with diacritics
>> takes 6 bytes (E0 A4 95 E0 A5 87) and the single letter takes  3 (E0 A4 95)
>> so if I search for *some_letter* where some letter has code (E0 A4 95)
>> lucene finds the "phrase" (E0 A4 95 E0 A5 87) that includes that letter.
>>
>>
>>
>> Any comments much appreciated.
>>
>>
>>
>> Thanks.
>>
>>
>>
>>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> Checked by AVG - www.avg.com
> Version: 8.5.375 / Virus Database: 270.13.0/2209 - Release Date: 07/10/09 17:57:00
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message