lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmed Al-Obaidy <ahmad_aloba...@yahoo.com>
Subject Re: ArabicAnalyzer
Date Sun, 03 May 2009 07:21:03 GMT
hmmmm, I didn't know about it... I only knew about some GPLed one... which didn't perform well
for me.

I will test the existing one, but I think it is much better than mine. 
So, I think I've reinvented the wheel, and it seems it is not even rounder :D

cheers, 

--- On Sun, 5/3/09, Robert Muir <rcmuir@gmail.com> wrote:

From: Robert Muir <rcmuir@gmail.com>
Subject: Re: ArabicAnalyzer
To: java-dev@lucene.apache.org
Cc: dmsmith555@gmail.com
Date: Sunday, May 3, 2009, 9:56 AM

have you looked at the existing ar analyzer in contrib? 
I like your analyzer but glancing at your code I think you can get the same behavior with
the existing one (it also has stopwords & stemming but you can disable that). lemme know
if i am missing something!


wrt farsi i wouldnt recommend using an arabic analyzer
for example on hamshari trec data:

simpleanalyzer: Average Precision:      0.374
arabicanalyzer: Average Precision:      0.316 <-- inappropriate stemming/stopwords

persianalyzer:   Average Precision:      0.481 <-- i can contrib this if someone
needs it.

thanks,
robert

On Sun, May 3, 2009 at 2:09 AM, Ahmed Al-Obaidy <ahmad_alobaidy@yahoo.com> wrote:


Well I don't know really... but it shouldn't be hard to support it.

--- On Sun, 5/3/09, DM Smith <dmsmith555@gmail.com> wrote:


From: DM Smith <dmsmith555@gmail.com>
Subject: Re: ArabicAnalyzer

To: java-dev@lucene.apache.org
Date: Sunday, May 3, 2009, 4:05 AM


On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote:


I've
 wrote a simple (but yet useful) ArabicAnalyzer, ArabicTokenizer and ArabicFilter. It can
handle Arabic text very well. 

I've tested it with large set of Arabic documents and it worked OK both in term of accuracy
and performance.


The code is released under Apache 2.0 license. And I would be very happy if you include it
with the code tree.
Sounds super. Do you know if it will handle Farsi as well?

-- DM Smith



      


-- 
Robert Muir
rcmuir@gmail.com




      
Mime
View raw message