Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 71186 invoked from network); 3 Aug 2009 16:40:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Aug 2009 16:40:20 -0000 Received: (qmail 98192 invoked by uid 500); 3 Aug 2009 16:40:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 98113 invoked by uid 500); 3 Aug 2009 16:40:23 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 98103 invoked by uid 99); 3 Aug 2009 16:40:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Aug 2009 16:40:23 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcmuir@gmail.com designates 209.85.217.218 as permitted sender) Received: from [209.85.217.218] (HELO mail-gx0-f218.google.com) (209.85.217.218) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Aug 2009 16:40:15 +0000 Received: by gxk18 with SMTP id 18so5449740gxk.5 for ; Mon, 03 Aug 2009 09:39:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=kpj31LZUfmC+5Op2DjSDxPPH2MnvXm4yoSni7/a/KMY=; b=H194UeNKsdc3HSOfUv0pKna5HbM4jUup0lxsC8K+ZDqBlqXT7twjf8XXYc7eqLaW34 iFwOi0qJmSAPhpNTGvBTUdHPYmOCUk7FSuSjM9cz3yBO3nXNJldv8Q+kqlyKYnyVNCa7 B0Hb/bnbELecdPq1mjG8vS3hLB9V1OkE/a7qM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=ZuEcBsgTIT77vhmtRj/VPoau1sQUE0RrHMrEbMAR6/pFRQ8IobuJU6Q9Fa6ASvJW1x 5p1diMYulnIL845jItlfncZ2kCbxf9d69JKkimLQhNEvM1+t4Oq3PxVGeFUv19WdtddZ LiKGjXgiDRGJKnWwqY/8V9B3SvGsTadUnr7yg= MIME-Version: 1.0 Received: by 10.100.154.17 with SMTP id b17mr7237368ane.83.1249317594716; Mon, 03 Aug 2009 09:39:54 -0700 (PDT) In-Reply-To: <1249315519.7368.3.camel@mine-lenovo> References: <1248348847.5882.12.camel@mine-lenovo> <8f0ad1f30907230633i4eef7bc2n58a463c446e6d4e5@mail.gmail.com> <1248422813.5499.6.camel@mine-lenovo> <8f0ad1f30907240539q5f2e3dbak9b90704ded96ed77@mail.gmail.com> <1249172239.5261.4.camel@mine-lenovo> <8f0ad1f30908021208k34cb5223nd5cd9c6f93ee038d@mail.gmail.com> <1249315519.7368.3.camel@mine-lenovo> Date: Mon, 3 Aug 2009 12:39:54 -0400 Message-ID: <8f0ad1f30908030939m422065aal56188d9723f14bac@mail.gmail.com> Subject: Re: arabic analyzer From: Robert Muir To: java-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Walid, thanks for your feedback. fyi I created an issue with some minor improvements (such as lam-lam prefix) to the arabic analyzer: http://issues.apache.org/jira/browse/LUCENE-1758 I also tried to improve the stopwords list, but your Arabic is surely much better than mine. If you are interested, have a look perhaps you could double check :) On Mon, Aug 3, 2009 at 12:05 PM, walid wrote: > Hello Robert, > > you are so right, plurals based on prefixes and suffixes are working. > Plurals based on inserted "=D9=88" do not (=D8=A8=D8=A7=D8=A8 and =D8=A7= =D8=A8=D9=88=D8=A8). > > The few words i had tested where all of the "insert" type and not the > prefix/suffix. > > thank you :) > > -walid > > On Sun, 2009-08-02 at 15:08 -0400, Robert Muir wrote: >> > the fact is, plural (as an example) is not supported, and that is one = of >> > the most common things that a person doing some search will expect to >> >> Walid, I'm not sure this is true. Many plurals are supported >> (certainly not exceptional cases or broken plurals). >> This is no different than the other language analyzers in lucene, even >> english stemmers: the most common forms are grouped together and thats >> about all you can say :) >> >> maybe in the future we can improve it though for your particular >> concern, add simple dictionary mappings for at least the most common >> broken plurals, something like that. >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --=20 Robert Muir rcmuir@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org