lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nader Henein <...@bayt.net>
Subject Re: Arabic analyzer
Date Thu, 07 Oct 2004 07:54:17 GMT
There is a way of writing an Arabic stemmer, it's just not a weekend 
project, I've seen the translate/stem option as well, and even tried it 
with Lucene, we've implemented Lucene on our database and we have about 
a million records in our DB with 19 indexed fields (some of which are 
clobs) in each record, the free text fields in each record are in many 
cases Arabic, we do not provide stemming on those just because I 
couldn't find a valid stemming or translation option, which held up to 
proper testing, some were ok, but after collecting data from user 
searches (averaging out at 5 searches per second) the Arabic stemming 
options would not be able to manage user expectations, which is what it 
comes down to, sometimes theory does not translate well to practice.

Nader Henein

Dawid Weiss wrote:

>
>> nothing to do with each other furthermore, Arabic uses phonetic 
>> indicators on each letter called diacritics that change the way you 
>> pronounce the word which in turn changes the words meaning so two 
>> word spelled exactly the same way with different diacritics will mean 
>> two separate things, 
>
>
> Just to point out the fact: most slavic languages also use diacritic 
> marks (above, like 'acute', or 'dot' marks, or below, like the Polish 
> 'ogonek' mark). Some people argue that they can be stripped off the 
> text upon indexing and that the queries usually disambiguate the 
> context of the word.
>
> It is just a digression. Now back to the arabic stemmer -- there has 
> to be a way of doing it. I know Vivisimo has clustering options for 
> arabic. They must be using a stemmer (and an English translation 
> dictionary), although it might be a commercial one. Take a look:
>
> http://vivisimo.com/search?v:file=cnnarabic
>
> D.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message