Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 63825 invoked from network); 13 Oct 2003 15:56:28 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 13 Oct 2003 15:56:28 -0000 Received: (qmail 32491 invoked by uid 500); 13 Oct 2003 15:56:16 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 32467 invoked by uid 500); 13 Oct 2003 15:56:16 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 32453 invoked from network); 13 Oct 2003 15:56:16 -0000 Received: from unknown (HELO web12701.mail.yahoo.com) (216.136.173.238) by daedalus.apache.org with SMTP; 13 Oct 2003 15:56:16 -0000 Message-ID: <20031013155619.28325.qmail@web12701.mail.yahoo.com> Received: from [194.152.209.29] by web12701.mail.yahoo.com via HTTP; Mon, 13 Oct 2003 08:56:19 PDT Date: Mon, 13 Oct 2003 08:56:19 -0700 (PDT) From: Otis Gospodnetic Subject: Re: Announce : arabic Stemmer/Analyzer for Lucene To: Lucene Users List Cc: pierrick.brihaye@wanadoo.fr In-Reply-To: <003d01c38595$031f2560$d4d1fea9@becane> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hello and thank you. I added this to out 'patch queue' at: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23784 Otis --- Pierrick Brihaye wrote: > Hi all, > > I have written a Lucene Analyzer for arabic. You will find it here : > http://perso.wanadoo.fr/pierrick.brihaye/ArabicAnalyzer.jar > (provisional > adress, anybody interested in hosting it ?) > > This work is still in beta stage but it gives quite good results :-) > > In order to make it work, you need : > > 1) a 1.4+ JVM (because of the native support for regular expressions > which > are heavily used in the program ; I've been too lazy to use an > external > package) > > 2) Apache Jakarta Commons-Collections : > http://jakarta.apache.org/commons/collections.html > > 3) a recent Lucene distribution ;-) > > All this work is based on the amazing Tim Buckwalter's Arabic > Morphological > Analyzer Version 1.0 > (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49) > originaly written in Perl and released under the GPL. > > The jar contains : > > a) the compiled classes > b) the required data files (dictionaries and compatibility tables) > c) 2 command-line test programs > d) 3 test documents with different encodings > e) the source code > f) a README file that will give you a little bit more of information > :-) > > To Lucene developers : I plan to offer this work to Lucene (see the > jar > hierarchy... and the source file headers ;-). Any objections ? > > Feedback is very welcome : there are quite a lot of unresolved > issues, with > the analyzer itselfs as well as with Lucene. > > mE AlslAmap, cheers, > > p.b. > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org