Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 62133 invoked from network); 30 Nov 2006 06:25:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Nov 2006 06:25:08 -0000 Received: (qmail 80289 invoked by uid 500); 30 Nov 2006 06:25:15 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 80246 invoked by uid 500); 30 Nov 2006 06:25:15 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 80235 invoked by uid 99); 30 Nov 2006 06:25:15 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Nov 2006 22:25:15 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [212.226.92.15] (HELO monkey.teamware.com) (212.226.92.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Nov 2006 22:25:02 -0800 Received: from nimitz (nimitz.teamw.com [10.142.128.10]) by monkey.teamware.com (8.13.1/8.13.1) with ESMTP id kAU6Oa9H018462 for ; Thu, 30 Nov 2006 08:24:36 +0200 Received: from [10.142.3.10] ([10.142.3.10]) by nimitz with ESMTP id mbu8ovh2; 30 Nov 2006 08:24:00 +0200 Message-ID: <456E791B.6010104@teamware.com> Date: Thu, 30 Nov 2006 17:24:27 +1100 From: Antony Bowesman Organization: Teamware Group User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: Analyzer thread safety; Stop words References: <4566B054.9040700@teamware.com> <456DF9BA.7000500@teamware.com> <456E4B61.30200@teamware.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (monkey.teamware.com [212.226.92.15]); Thu, 30 Nov 2006 08:24:36 +0200 (EET) X-TWG-MailScanner-Information: See www.mailscanner.info for information X-TWG-MailScanner: Found to be clean X-TWG-MailScanner-SpamCheck: not spam, SpamAssassin (score=0.001, required 5, autolearn=not spam, BAYES_50 0.00) X-MailScanner-From: adb@teamware.com X-Virus-Checked: Checked by ClamAV on apache.org Yonik Seeley wrote: > On 11/29/06, Antony Bowesman wrote: >> Yonik Seeley wrote: > > The GreekAnalyzer is just an example of how you can use existing > Analyzers (as long as they have a default constructor), but it's not > the recommended approach. > > TokenFilters are preffered over Analyzers.... you can plug them > together in any way you see fit to solve your analysis problem. For > Solr, an added bonus of using chains of filters is that Solr can > "know" about the results after each filter and show you the results on > an analysis web page (very useful for debugging). > > If I were to analyze greek text, I might do something like this: > > > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > > language="Greek" /> > xt"/> > > > > If you try to put everything in Analyzer constructors, you get > combinatorial explosion. I guess you would use methods rather than, as you say, getting into constructor hell. Anyway, I'll have a deeper look at the solr stuff when I get to phase 2. Right now, I've gone as far with analysis as I need to, but I would like to get better configuration than I've currently got. I know it will come back to bite... Thanks for your comments Yonik Antony --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org