lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nader Henein <>
Subject Re: Arabic analyzer
Date Thu, 07 Oct 2004 06:16:01 GMT
I worked on trying to develop one and it became a colossal pain, a 
conclusive Arabic dictionary is about 20 volumes roughly the size of an 
encyclopedia, just to give you some background when you search for a 
word in the encyclopedia you have to reduce it to either it's 2 or three 
letter root, then you can look for your desired word underneath that 
root, reducing the words to that root as part of the stemming is useless 
because words belonging to the same root more often than not have 
nothing to do with each other furthermore, Arabic uses phonetic 
indicators on each letter called diacritics that change the way you 
pronounce the word which in turn changes the words meaning so two word 
spelled exactly the same way with different diacritics will mean two 
separate things, I've seen Arabic stemmers that kinda of work, but none 
of them are open source, this is a good paper from Berkeley that 
outlines the work and the challenges,, hope it helps.

Nader Henein

Scott Smith wrote:

>Is anyone aware of an open source (non-GPL; i.e.., free for commercial
>use) Arabic analyzer for Lucene?  Does Arabic really require a stemmer
>as well (some of the reading I've seen on the web would suggest that a
>stemmer is almost a necessity with Arabic to get anything useful where
>it is not with other languages).

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message