Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B4D8A10AAA for ; Tue, 9 Apr 2013 20:14:17 +0000 (UTC) Received: (qmail 3120 invoked by uid 500); 9 Apr 2013 20:14:16 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 3029 invoked by uid 500); 9 Apr 2013 20:14:16 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 3021 invoked by uid 99); 9 Apr 2013 20:14:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Apr 2013 20:14:16 +0000 Date: Tue, 9 Apr 2013 20:14:16 +0000 (UTC) From: "Robert Muir (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a "managed" schema facility MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627026#comment-13627026 ] Robert Muir commented on SOLR-4658: ----------------------------------- I mentioned this same bug as it applies to similarities on the dev list a week or so ago! > In preparation for dynamic schema modification via REST API, add a "managed" schema facility > -------------------------------------------------------------------------------------------- > > Key: SOLR-4658 > URL: https://issues.apache.org/jira/browse/SOLR-4658 > Project: Solr > Issue Type: Sub-task > Components: Schema and Analysis > Reporter: Steve Rowe > Assignee: Steve Rowe > Priority: Minor > Fix For: 4.3, 5.0 > > Attachments: SOLR-4658.patch, SOLR-4658.patch > > > The idea is to have a set of configuration items in {{solrconfig.xml}}: > {code:xml} > > {code} > It will be a precondition for future dynamic schema modification APIs that {{mutable="true"}}. {{solrconfig.xml}} parsing will fail if {{mutable="true"}} but {{managed="false"}}. > When {{managed="true"}}, and the resource named in {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade the schema to "managed": the non-managed schema resource (typically {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at {{/configs/$configName/}}, and the non-managed schema resource is renamed by appending {{.bak}}, e.g. {{schema.xml.bak}}. > Once the upgrade has taken place, users can get the full schema from the {{/schema?wt=schema.xml}} REST API, and can use this as the basis for modifications which can then be used to manually downgrade back to non-managed schema: put the {{schema.xml}} in place, then add {{}} to {{solrconfig.xml}} (or remove the whole {{}} element, since {{managed="false"}} is the default). > If users take no action, then Solr behaves the same as always: the example {{solrconfig.xml}} will include {{}}. > For a discussion of rationale for this feature, see [~hossman_lucene@fucit.org]'s post to the solr-user mailing list in the thread "Dynamic schema design: feedback requested" [http://markmail.org/message/76zj24dru2gkop7b]: > > {quote} > Ignoring for a moment what format is used to persist schema information, I > think it's important to have a conceptual distinction between "data" that > is managed by applications and manipulated by a REST API, and "config" > that is managed by the user and loaded by solr on init -- or via an > explicit "reload config" REST API. > Past experience with how users percieve(d) solr.xml has heavily reinforced > this opinion: on one hand, it's a place users must specify some config > information -- so people wnat to be able to keep it in version control > with other config files. On the other hand it's a "live" data file that > is rewritten by solr when cores are added. (God help you if you want do a > rolling deploy a new version of solr.xml where you've edited some of the > config values while simultenously clients are creating new SolrCores) > As we move forward towards having REST APIs that treat schema information > as "data" that can be manipulated, I anticipate the same types of > confusion, missunderstanding, and grumblings if we try to use the same > pattern of treating the existing schema.xml (or some new schema.json) as a > hybrid configs & data file. "Edit it by hand if you want, the /schema/* > REST API will too!" ... Even assuming we don't make any of the same > technical mistakes that have caused problems with solr.xml round tripping > in hte past (ie: losing comments, reading new config options that we > forget to write back out, etc...) i'm fairly certain there is still going > to be a lot of things that will loook weird and confusing to people. > (XML may bave been designed to be both "human readable & writable" and > "machine readable & writable", but practically speaking it's hard have a > single XML file be "machine and human readable & writable") > I think it would make a lot of sense -- not just in terms of > implementation but also for end user clarity -- to have some simple, > straightforward to understand caveats about maintaining schema > information... > 1) If you want to keep schema information in an authoritative config file > that you can manually edit, then the /schema REST API will be read only. > 2) If you wish to use the /schema REST API for read and write operations, > then schema information will be persisted under the covers in a data store > whose format is an implementation detail just like the index file format. > 3) If you are using a schema config file and you wish to switch to using > the /schema REST API for managing schema information, there is a > tool/command/API you can run to so. > 4) if you are using the /schema REST API for managing schema information, > and you wish to switch to using a schema config file, there is a > tool/command/API you can run to export the schema info if a config file > format. > ...wether of not the "under the covers in a data store" used by the REST > API is JSON, or some binary data, or an XML file just schema.xml w/o > whitespace/comments should be an implementation detail. Likewise is the > question of wether some new config file formats are added -- it shouldn't > matter. > If it's config it's config and the user owns it. > If it's data it's data and the system owns it. > : is the risk they take if they want to manually edit it - it's no > : different than today when you edit the file and do a Core reload or > : something. I think we can improve some validation stuff around that, but > : it doesn't seem like a show stopper to me. > The new risk is multiple "actors" (both the user, and Solr) editing the > file concurrently, and info that might be lost due to Solr reading the > file, manpulating internal state, and then writing the file back out. > Eg: User hand edits may be lost if they happen on disk during Solr's > internal manpulation of data. API edits may be reflected in the internal > state, but lost if the User writes the file directly and then does a core > reload, etc.... > : At a minimum, I think the user should be able to start with a hand > : modified file. Many people *heavily* modify the example schema to fit > : their use case. If you have to start doing that by making 50 rest API > : calls, that's pretty rough. Once you get your schema nice and happy, you > : might script out those rest calls, but initially, it's much > : faster/easier to whack the schema into place in a text editor IMO. > I don't think there is any disagreement about that. The ability to say > "my schema is a config file and i own it" should always exist (remove > it over my dead body) > The question is what trade offs to expect/require for people who would > rather use an API to manipulate these things -- i don't think it's > unreasable to say "if you would like to manipulate the schema using an > API, then you give up the ability to manipulate it as a config file on > disk" > ("if you want the /schema API to drive your car, you have to take your > foot of hte pedals and let go of the steering wheel") > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org