lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nader Henein <>
Subject Re: Arabic analyzer
Date Thu, 07 Oct 2004 12:54:49 GMT
I'd be happy to help anyone test this out, my Arabic is pretty good.


Andrzej Bialecki wrote:

> Dawid Weiss wrote:
>>> nothing to do with each other furthermore, Arabic uses phonetic 
>>> indicators on each letter called diacritics that change the way you 
>>> pronounce the word which in turn changes the words meaning so two 
>>> word spelled exactly the same way with different diacritics will 
>>> mean two separate things, 
>> Just to point out the fact: most slavic languages also use diacritic 
>> marks (above, like 'acute', or 'dot' marks, or below, like the Polish 
>> 'ogonek' mark). Some people argue that they can be stripped off the 
>> text upon indexing and that the queries usually disambiguate the 
>> context of the word.
> Hmm. This brings up a question: the algorithmic stemmer package from 
> Egothor works quite well for Polish (, 
> wouldn't it work well for Arabic, too?
> I lack the necessary expertise to evaluate results (knowing only two 
> or three arabic words ;-) ), but I can certainly help someone to get 
> started with testing...

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message