lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nader Henein <...@bayt.net>
Subject Re: Arabic analyzer
Date Thu, 07 Oct 2004 12:54:49 GMT
I'd be happy to help anyone test this out, my Arabic is pretty good.

Nader

Andrzej Bialecki wrote:

> Dawid Weiss wrote:
>
>>
>>> nothing to do with each other furthermore, Arabic uses phonetic 
>>> indicators on each letter called diacritics that change the way you 
>>> pronounce the word which in turn changes the words meaning so two 
>>> word spelled exactly the same way with different diacritics will 
>>> mean two separate things, 
>>
>>
>>
>> Just to point out the fact: most slavic languages also use diacritic 
>> marks (above, like 'acute', or 'dot' marks, or below, like the Polish 
>> 'ogonek' mark). Some people argue that they can be stripped off the 
>> text upon indexing and that the queries usually disambiguate the 
>> context of the word.
>
>
> Hmm. This brings up a question: the algorithmic stemmer package from 
> Egothor works quite well for Polish (http://www.getopt.org/stempel), 
> wouldn't it work well for Arabic, too?
>
> I lack the necessary expertise to evaluate results (knowing only two 
> or three arabic words ;-) ), but I can certainly help someone to get 
> started with testing...
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message