Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 75163 invoked from network); 15 Jan 2010 00:45:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Jan 2010 00:45:17 -0000 Received: (qmail 67396 invoked by uid 500); 15 Jan 2010 00:45:16 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 67359 invoked by uid 500); 15 Jan 2010 00:45:16 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 67349 invoked by uid 99); 15 Jan 2010 00:45:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2010 00:45:16 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2010 00:45:15 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id DA593234C4B2 for ; Thu, 14 Jan 2010 16:44:54 -0800 (PST) Message-ID: <1834508427.252461263516294893.JavaMail.jira@brutus.apache.org> Date: Fri, 15 Jan 2010 00:44:54 +0000 (UTC) From: "Hoss Man (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory In-Reply-To: <349452053.1261337658468.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800471#action_12800471 ] Hoss Man commented on SOLR-1677: -------------------------------- bq. And I also can't see anyone really spending time to aggressively ensure that the example schema etc is all up to date I think you are vastly underestimating how much work is spent reviewing the example schema.xml prior to releases. It would be trivial to search/replace luceneMatchVersion="X" with luceneMatchVersion="Y" anytime the "current" version of Version was updated in Lucene-Java bq. the hardcoded 2.4 behavior is the action at a distance, because if i do not specify Version in my configuration file, then i get this very old behavior. I don't follow you at all -- you have identified no action, or distance in your example. When i say i'm worried about scary action at a distance, i'm talking about editing some thing A in a config file, and having it result in changed behavior (action) in things B, C and D that do not directly refer to A in any way (distance). Further more these changes in behavior are silent (thus scary). If I have {{}} and much later in the config {{}} the editing A results in and action on B at a distance -- but this should not suprise me at all because B explicitly refrences A. Having a global {{}} tag that affects the behavior of a variety of different things when it's modified leads to situations where people might change that value triggering changes in many components w/o a clear idea of what might have changed -- so they don't even know what things they should focus on testing for correctness after makign that change. The existing {{}} property also leads to action at a distance type situations -- but that is a lot less scary to me because at least with it there is a uniform set of changes to *all* schema objects between any two versions, so it's easy to document what cahnges when you go from 1.1 to 1.2, or 1.2 to 1.3 ... but with luceneMatchVersion the potential changes are unique to every individual Class that cares about it. {quote} If this is really your concern, then i have an alternative i propose. * No default anywhere, not even in the code * Version is mandatory if the thing requires it {quote} This is something Uwe and i both discussed in previous comments... https://issues.apache.org/jira/browse/SOLR-1677?focusedCommentId=12796872#action_12796872 https://issues.apache.org/jira/browse/SOLR-1677?focusedCommentId=12796937#action_12796937 ...as i said: i'm fine with this idea in theory -- as a long term plan -- but there has to be a gradual migration process for people. ie: it can be required on certain objects in a future release, but for at least the next release it needs to be possible to not specify the luceneMatchVersion on all of these objects, and when people use them w/o specifying, they can log big fat warnings on initi that it is defaulting to 2.4, and they should set the property explicitly if that's what they want. ---- bq. I still do not want it in schema.xml, as Version is a global Lucene thing! Uwe: I think you are missunderstanding the reason for a distinction between solrconfig.xml and schema.xml in Solr. If (for hte sake of argument) luceneMatchVersion really should be a "global Lucene thing" then that is precisely why it should be in schema.xml. schema.xml is for configuration that is inheriently part of the index, and must be consistent regardless of who/how/why that index is being used. solrconfig.xml is where settings are put that are specific to how a a particular instance of an index is being used. If a setting is in solrconfig.xml, then it should to be possible for that setting to be completley different on differnet solr instances that use the exact same schema.xml -- even if they use cloned copies of the same index directory. (ie: master/slave distinctions in replication; peer slaves with distinct handler/cache settings to serve distinct use cases; etc...) That's the reason why nothing that hangs off of IndexSchema is currently allowed to be SolrCoreAware, or get access to the SolrConfig object (and the SolrResourceLoader abstraction was created) ... nothing about the SolrCore "instance" should be allowed to influence the resulting index, because that index may later be used on a differnet instance with a different config. As i mentioned before: solrconfig.xml can depend on schema.xml, but schema.xml can not depend on solrconfig.xml So if a global luceneMatchVersion can affect the behavior of an analyzer or FieldType in a way that is "persisted" as part of hte index -- and other classes (like QueryParser in Robert's example) need to make sure to use the same luceneMatchVersion to behave correctly with that index, then that setting needs to be in the schema.xml so it is consistent no matter how/where that index and schema.xml file are used. Does that make sense? ---- I'd still like to clarify this whole issue of wether "Lucene-Java", as a project, has an expectation that client applications will always use a consistent value for Version when constructing objects that interact with an index, as Robert alluded to in a previous comment... bq. I don't think Version is intended so you can use X.Y on this part and Y.Z on this part This was not my impression when Version was added -- but i freely admit I wasn' paying that much attention. In Uwe's comment he implied (but didn't actually state) that he concurred with Robert... bq. ...Version is a global Lucene thing... *Iff* that expectation really is true in Lucnee-Java, and *iff* there really is an expectation that using multiple Version values withing Solr is likely to cause people problems as objects interact, then it seems to be that it be a very bad idea to offer to any sort of out of the box support for per object overriding of luceneMatchVersion in our solrconfig.xml/schema.xml. i know, i know ... this is a complete 180 from my previous claim that we should _only_ have per object configuration -- a claim that i still stand behind if Lucene-Java "supports" applications using multiple values of Version, but if that is not considered "supported" and if changes are actively being made in Lucene-Java that explicitly assume consistent Version usage, then I'm not convinced it owuld be a good idea to enable people to tweak things in that way. Anyone who understands the underlying Java code enough to appreciate the nuances of using A.B in one place and X.Y in another place can write their own Factory that looks at a luceneMatchVersion nit param -- the out of hte box ones should stick with the global setting. BUT!!!!! ... those are Big "IFFs" ... * Uwe: do you concur with Robert? * Are there any threads/docs about the expecations of Version homo/hetero-genousness in Lucene-Java? > Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory > ------------------------------------------------------------------------------------------- > > Key: SOLR-1677 > URL: https://issues.apache.org/jira/browse/SOLR-1677 > Project: Solr > Issue Type: Sub-task > Components: Schema and Analysis > Reporter: Uwe Schindler > Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch > > > Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most important example is StandardTokenizer, which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in 2.9. > In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9, the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. > This patch adds basic support for the Lucene Version property to the base factories. Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently contains a helper map to decode the version strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version ctors in Lucene). > This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.