lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: ArabicAnalyzer
Date Sun, 03 May 2009 06:56:13 GMT
have you looked at the existing ar analyzer in contrib?
I like your analyzer but glancing at your code I think you can get the same
behavior with the existing one (it also has stopwords & stemming but you can
disable that). lemme know if i am missing something!

wrt farsi i wouldnt recommend using an arabic analyzer
for example on hamshari trec data:

simpleanalyzer: Average Precision:      0.374
arabicanalyzer: Average Precision:      0.316 <-- inappropriate
stemming/stopwords
persianalyzer:   Average Precision:      0.481 <-- i can contrib this if
someone needs it.

thanks,
robert

On Sun, May 3, 2009 at 2:09 AM, Ahmed Al-Obaidy <ahmad_alobaidy@yahoo.com>wrote:

> Well I don't know really... but it shouldn't be hard to support it.
>
> --- On *Sun, 5/3/09, DM Smith <dmsmith555@gmail.com>* wrote:
>
>
> From: DM Smith <dmsmith555@gmail.com>
> Subject: Re: ArabicAnalyzer
> To: java-dev@lucene.apache.org
> Date: Sunday, May 3, 2009, 4:05 AM
>
>
>
> On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote:
>
> I've wrote a simple (but yet useful) ArabicAnalyzer, ArabicTokenizer and
> ArabicFilter. It can handle Arabic text very well.
>
> I've tested it with large set of Arabic documents and it worked OK both in
> term of accuracy and performance.
>
> The code is released under Apache 2.0 license. And I would be very happy if
> you include it with the code tree.
>
>
> Sounds super. Do you know if it will handle Farsi as well?
>
> -- DM Smith
>
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
View raw message