lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <dmsmith...@gmail.com>
Subject Re: ArabicAnalyzer
Date Sun, 03 May 2009 11:37:05 GMT

On May 3, 2009, at 2:56 AM, Robert Muir wrote:

> have you looked at the existing ar analyzer in contrib?
> I like your analyzer but glancing at your code I think you can get  
> the same behavior with the existing one (it also has stopwords &  
> stemming but you can disable that). lemme know if i am missing  
> something!
>
> wrt farsi i wouldnt recommend using an arabic analyzer
> for example on hamshari trec data:
>
> simpleanalyzer: Average Precision:      0.374
> arabicanalyzer: Average Precision:      0.316 <-- inappropriate  
> stemming/stopwords
> persianalyzer:   Average Precision:      0.481 <-- i can contrib  
> this if someone needs it.

Please do contribute it. While I don't know Persian at all, the  
program I am working on is translated into Farsi and we have several  
indexed texts.


>
>
> thanks,
> robert
>
> On Sun, May 3, 2009 at 2:09 AM, Ahmed Al-Obaidy <ahmad_alobaidy@yahoo.com 
> > wrote:
> Well I don't know really... but it shouldn't be hard to support it.
>
> --- On Sun, 5/3/09, DM Smith <dmsmith555@gmail.com> wrote:
>
> From: DM Smith <dmsmith555@gmail.com>
> Subject: Re: ArabicAnalyzer
> To: java-dev@lucene.apache.org
> Date: Sunday, May 3, 2009, 4:05 AM
>
>
>
> On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote:
>
>> I've wrote a simple (but yet useful) ArabicAnalyzer,  
>> ArabicTokenizer and ArabicFilter. It can handle Arabic text very  
>> well.
>>
>> I've tested it with large set of Arabic documents and it worked OK  
>> both in term of accuracy and performance.
>>
>> The code is released under Apache 2.0 license. And I would be very  
>> happy if you include it with the code tree.
>
> Sounds super. Do you know if it will handle Farsi as well?
>
> -- DM Smith
>
>
>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com


Mime
View raw message