lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@cs.put.poznan.pl>
Subject Re: Arabic analyzer
Date Thu, 07 Oct 2004 06:25:41 GMT

> nothing to do with each other furthermore, Arabic uses phonetic 
> indicators on each letter called diacritics that change the way you 
> pronounce the word which in turn changes the words meaning so two word 
> spelled exactly the same way with different diacritics will mean two 
> separate things, 

Just to point out the fact: most slavic languages also use diacritic 
marks (above, like 'acute', or 'dot' marks, or below, like the Polish 
'ogonek' mark). Some people argue that they can be stripped off the text 
upon indexing and that the queries usually disambiguate the context of 
the word.

It is just a digression. Now back to the arabic stemmer -- there has to 
be a way of doing it. I know Vivisimo has clustering options for arabic. 
They must be using a stemmer (and an English translation dictionary), 
although it might be a commercial one. Take a look:

http://vivisimo.com/search?v:file=cnnarabic

D.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message