Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 86088 invoked from network); 19 Jan 2010 01:29:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Jan 2010 01:29:18 -0000 Received: (qmail 76341 invoked by uid 500); 19 Jan 2010 01:29:18 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 76304 invoked by uid 500); 19 Jan 2010 01:29:18 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 76188 invoked by uid 99); 19 Jan 2010 01:29:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jan 2010 01:29:18 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jan 2010 01:29:15 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 639DB234C045 for ; Mon, 18 Jan 2010 17:28:54 -0800 (PST) Message-ID: <1775231881.324801263864534403.JavaMail.jira@brutus.apache.org> Date: Tue, 19 Jan 2010 01:28:54 +0000 (UTC) From: "Mark Miller (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory In-Reply-To: <349452053.1261337658468.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802020#action_12802020 ] Mark Miller commented on SOLR-1677: ----------------------------------- In my opinion this should be real simple. Having to specify a Lucene version for each component is not simple - its beyond most users. I think its beyond me (laugh as you see fit). Having to accept Lucene 2.4 behavior by default because of Solr back compat issues is also "weak". A new user should get all the bug fixes of the latest Lucene with minimal effort. Hopefully no effort. Older users should be able to get the newest with minimal effort as well - not having to go one by one through each component and upgrading it. I can't imagine juggling all these versions for each component - thats ugly enough in Lucene - it shouldn't infect Solr for the average case. Personally, I do think there should be a global default. And I think right next to it, it should say, if you change this, you must reindex. No worries about action at a distance. The action is to get the latest and greatest Lucene has to offer rather than older buggy or back compat behavior. Reindex, get latest greatest. Don't reindex and your on your own. Solr might rip your head off. We should also offer per component for real experts, but I wouldn't be meddling that way myself unless in a bind. Solr should be real simple about this - and the latest Solr should use the latest bug fixes from Lucene, with previous configs out there defaulting to 2.4 compatibility. I abbreviated the heck out of my arguments and thinking, but damn it thats what I think :) > Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory > ------------------------------------------------------------------------------------------- > > Key: SOLR-1677 > URL: https://issues.apache.org/jira/browse/SOLR-1677 > Project: Solr > Issue Type: Sub-task > Components: Schema and Analysis > Reporter: Uwe Schindler > Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch > > > Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most important example is StandardTokenizer, which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in 2.9. > In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9, the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. > This patch adds basic support for the Lucene Version property to the base factories. Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently contains a helper map to decode the version strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version ctors in Lucene). > This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.