Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 79474 invoked from network); 1 Dec 2010 01:11:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Dec 2010 01:11:18 -0000 Received: (qmail 11731 invoked by uid 500); 1 Dec 2010 01:11:14 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 11675 invoked by uid 500); 1 Dec 2010 01:11:14 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 11666 invoked by uid 99); 1 Dec 2010 01:11:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Dec 2010 01:11:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [68.116.39.62] (HELO rectangular.com) (68.116.39.62) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Dec 2010 01:11:06 +0000 Received: from marvin by rectangular.com with local (Exim 4.63) (envelope-from ) id 1PNbDZ-0003MI-4W for dev@lucene.apache.org; Tue, 30 Nov 2010 17:10:45 -0800 Date: Tue, 30 Nov 2010 17:10:45 -0800 To: dev@lucene.apache.org Subject: Re: deprecating Versions Message-ID: <20101201011045.GA12886@rectangular.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.13 (2006-08-11) From: Marvin Humphrey On Mon, Nov 29, 2010 at 05:34:27AM -0500, Robert Muir wrote: > Is it somehow possible i could convince everyone that all the analyzers we > provide are simply examples? This way we could really make this a bit more > reasonable and clean up a lot of stuff. I understand what you're getting at. We don't really expect people to fork an analyzer code base, though -- so we need to draw a line between e.g. the code that implements StopFilter and stoplist content. We want the low-level code to be part of the library, but maybe we want stoplist content to be considered example code. > Seems like we really want to move towards a more declarative model where > these are just config files... so only then it will ok for us to change them > because they suddenly aren't suffixed with .java?! Consider how this might work with e.g. RussianAnalyzer. The declaratively-expressed sample analyzer config could contain a hard-coded list of Russian stop words, and as this hard-coded stoplist would travel with the index in a config file, there would be no index compatibility problems upon upgrading Lucene. The stoplist in the sample config could change, even on bugfix releases. Config file syntax would potentially be affected by a Lucene upgrade, but that doesn't affect index content and maintaining back compat is straightforward. Things are more difficult with versioning e.g. stemmers, but I think the stoplist example illustrates the potential of declarative analyzer specification. Maybe specifying Version in a sample file and dispatching to different revs of a Snowball stemmer is less painful than forcing a user to figure out Version from API documentation? Having to extract an Analyzer from an index directory does present the potential for Analyzer mismatches in a multi-node setup where e.g. the machine that parses the query string and the machine which executes matching are not the same. Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org