lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a "managed" schema facility
Date Mon, 01 Apr 2013 03:21:16 GMT
Steve Rowe created SOLR-4658:
--------------------------------

             Summary: In preparation for dynamic schema modification via REST API, add a "managed"
schema facility
                 Key: SOLR-4658
                 URL: https://issues.apache.org/jira/browse/SOLR-4658
             Project: Solr
          Issue Type: Sub-task
          Components: Schema and Analysis
            Reporter: Steve Rowe
            Assignee: Steve Rowe
            Priority: Minor
             Fix For: 4.3


The idea is to have a set of configuration items in {{solrconfig.xml}}:

{code:xml}
<schema managed="true" mutable="true" managedSchemaResourceName="managed-schema"/>
{code} 

It will be a precondition for future dynamic schema modification APIs that {{mutable="true"}}.
 {{solrconfig.xml}} parsing will fail if {{mutable="true"}} but {{managed="false"}}.

When {{managed="true"}}, and the resource named in {{managedSchemaResourceName}} doesn't exist,
Solr will automatically upgrade the schema to "managed": the non-managed schema resource (typically
{{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} under {{$solrHome/$collectionOrCore/conf/}},
or on ZooKeeper at {{/configs/$configName/}}, and the non-managed schema resource is renamed
by appending {{.bak}}, e.g. {{schema.xml.bak}}.

Once the upgrade has taken place, users can get the full schema from the {{/schema?wt=schema.xml}}
REST API, and can use this as the basis for modifications which can then be used to manually
downgrade back to non-managed schema: put the {{schema.xml}} in place, then add {{<schema
managed="false"/>}} to {{solrconfig.xml}} (or remove the whole {{<schema/>}} element,
since {{managed="false"}} is the default).

If users take no action, then Solr behaves the same as always: the example {{solrconfig.xml}}
will include {{<schema managed="false" ...>}}.

For a discussion of rationale for this feature, see [~hossman_lucene@fucit.org]'s post to
the solr-user mailing list in the thread "Dynamic schema design: feedback requested" [http://markmail.org/message/76zj24dru2gkop7b]:
 
{quote}
Ignoring for a moment what format is used to persist schema information, I 
think it's important to have a conceptual distinction between "data" that 
is managed by applications and manipulated by a REST API, and "config" 
that is managed by the user and loaded by solr on init -- or via an 
explicit "reload config" REST API.

Past experience with how users percieve(d) solr.xml has heavily reinforced 
this opinion: on one hand, it's a place users must specify some config 
information -- so people wnat to be able to keep it in version control 
with other config files.  On the other hand it's a "live" data file that 
is rewritten by solr when cores are added.  (God help you if you want do a 
rolling deploy a new version of solr.xml where you've edited some of the 
config values while simultenously clients are creating new SolrCores)

As we move forward towards having REST APIs that treat schema information 
as "data" that can be manipulated, I anticipate the same types of 
confusion, missunderstanding, and grumblings if we try to use the same 
pattern of treating the existing schema.xml (or some new schema.json) as a 
hybrid configs & data file.  "Edit it by hand if you want, the /schema/* 
REST API will too!"  ... Even assuming we don't make any of the same 
technical mistakes that have caused problems with solr.xml round tripping 
in hte past (ie: losing comments, reading new config options that we 
forget to write back out, etc...) i'm fairly certain there is still going 
to be a lot of things that will loook weird and confusing to people.

(XML may bave been designed to be both "human readable & writable" and 
"machine readable & writable", but practically speaking it's hard have a 
single XML file be "machine and human readable & writable")

I think it would make a lot of sense -- not just in terms of 
implementation but also for end user clarity -- to have some simple, 
straightforward to understand caveats about maintaining schema 
information...

1) If you want to keep schema information in an authoritative config file 
that you can manually edit, then the /schema REST API will be read only. 

2) If you wish to use the /schema REST API for read and write operations, 
then schema information will be persisted under the covers in a data store 
whose format is an implementation detail just like the index file format.

3) If you are using a schema config file and you wish to switch to using 
the /schema REST API for managing schema information, there is a 
tool/command/API you can run to so.

4) if you are using the /schema REST API for managing schema information, 
and you wish to switch to using a schema config file, there is a 
tool/command/API you can run to export the schema info if a config file 
format.


...wether of not the "under the covers in a data store" used by the REST 
API is JSON, or some binary data, or an XML file just schema.xml w/o 
whitespace/comments should be an implementation detail.  Likewise is the 
question of wether some new config file formats are added -- it shouldn't 
matter.

If it's config it's config and the user owns it.
If it's data it's data and the system owns it.

: is the risk they take if they want to manually edit it - it's no 
: different than today when you edit the file and do a Core reload or 
: something. I think we can improve some validation stuff around that, but 
: it doesn't seem like a show stopper to me.

The new risk is multiple "actors" (both the user, and Solr) editing the 
file concurrently, and info that might be lost due to Solr reading the 
file, manpulating internal state, and then writing the file back out.  

Eg: User hand edits may be lost if they happen on disk during Solr's 
internal manpulation of data.  API edits may be reflected in the internal 
state, but lost if the User writes the file directly and then does a core 
reload, etc....

: At a minimum, I think the user should be able to start with a hand 
: modified file. Many people *heavily* modify the example schema to fit 
: their use case. If you have to start doing that by making 50 rest API 
: calls, that's pretty rough. Once you get your schema nice and happy, you 
: might script out those rest calls, but initially, it's much 
: faster/easier to whack the schema into place in a text editor IMO.

I don't think there is any disagreement about that.  The ability to say 
"my schema is a config file and i own it" should always exist (remove 
it over my dead body) 

The question is what trade offs to expect/require for people who would 
rather use an API to manipulate these things -- i don't think it's 
unreasable to say "if you would like to manipulate the schema using an 
API, then you give up the ability to manipulate it as a config file on 
disk"

("if you want the /schema API to drive your car, you have to take your 
foot of hte pedals and let go of the steering wheel")
{quote}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message