lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 23784] New: - [PATCH] Arabic Analyzer, Stemmer
Date Mon, 13 Oct 2003 15:54:47 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23784>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23784

[PATCH] Arabic Analyzer, Stemmer

           Summary: [PATCH] Arabic Analyzer, Stemmer
           Product: Lucene
           Version: unspecified
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: Analysis
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: otis@apache.org
                CC: pierrick.brihaye@wanadoo.fr


September 28th 2003 contribution from "Pierrick Brihaye"
<pierrick.brihaye@wanadoo.fr>.

Original email:

Hi all,

I have written a Lucene Analyzer for arabic. You will find it here :
http://perso.wanadoo.fr/pierrick.brihaye/ArabicAnalyzer.jar 
(provisional
adress, anybody interested in hosting it ?)

This work is still in beta stage but it gives quite good results :-)

In order to make it work, you need :

1) a 1.4+ JVM (because of the native support for regular expressions 
which
are heavily used in the program ; I've been too lazy to use an external
package)

2) Apache Jakarta Commons-Collections :
http://jakarta.apache.org/commons/collections.html

3) a recent Lucene distribution ;-)

All this work is based on the amazing Tim Buckwalter's Arabic 
Morphological
Analyzer Version 1.0
(http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49)
originaly written in Perl and released under the GPL.

The jar contains :

a) the compiled classes
b) the required data files (dictionaries and compatibility tables)
c) 2 command-line test programs
d) 3 test documents with different encodings
e) the source code
f) a README file that will give you a little bit more of information 
:-)

To Lucene developers : I plan to offer this work to Lucene (see the jar
hierarchy... and the source file headers ;-). Any objections ?

Feedback is very welcome : there are quite a lot of unresolved issues, 
with
the analyzer itselfs as well as with Lucene.

mE AlslAmap, cheers,

p.b.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message