lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4478) Allow cores to specify a named config set
Date Thu, 29 Aug 2013 21:37:52 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754089#comment-13754089
] 

Erick Erickson commented on SOLR-4478:
--------------------------------------

I got to thinking about this and trying to take it out of mothballs and I'm starting to think
it's a terrible idea for 4.x and should be postponed or abandoned unless and until we do something
like what has been discussed elsewhere; having there be "one source of truth" (ZooKeeper has
been discussed for instance). So I'll list out the issues I've thought about and if there
are straightforward answers to them I'll be happy to reconsider.

Each issue is probably technically do-able, but the sum (and ones I haven't seen yet) totally
scare me.

1> Traditional master/slave architectures. Let's say we change the schema (it'd have to
be on the master?). How to get that to the slaves? Currently the confFiles directive has an
explicit test and will not copy a directory. I'm not convinced it'd even work with relative
paths and listing _every_ file in the configset dir would be kludgy at best. And I think the
confFiles directive doesn't work outside the "conf" directory for the core it's replicating
anyway. I suppose the user could copy the configset directory to all the nodes in the farm,
but....

2> The new REST API for modifying the schema. In non-SolrCloud mode, how does that work?
Is it only allowed on the master (assuming we can solve <1>)? How to enforce?

3> Sharing the solrConfig object is also fraught with issues as discussed above. There's
already the "share schema" option, so at least it's possible to have one shared schema.

4> How to get any changes reloaded in a master/slave environment for all the affected cores
on all the machines? You'd need some kind of manual process of going to each one and issuing
a new command "ReloadAllCores" or build in some kind of notification system. Or we'd need
to require the user to keep a list of all the nodes and all the cores and script reloading
them all. Nobody should be re-inventing ZooKeeper.

5> How to get any changes reloaded in even the non master/slave environment for all the
affected cores? A new command? Periodic polling? Check every query/update request?

6> Sticky wickets I haven't thought of yet, I'm afraid, very afraid... Each of these is
solvable, but considering the effort involved it doesn't seem like it's worth pursuing right
now, at least my interest is disappearing.

And wrapped around this is that SolrCloud already handles most of the things I'm worried about,
especially getting changes propagated to all the right places in the cluster. SolrCloud already
has a way to reload all the nodes that take part in a collection. SolrCloud already has the
notifications of changes to the config set built in (at least I think, if not it will). 

My feeling at this point is that supporting this well would turn into a huge amount of work
_that would then be thrown away_ if we go to a "one source of truth" model in Solr5 (or even
6). And that actually _using_ the capability would be fragile and complex. So unless I can
be convinced otherwise, I'm going to assign this back to nobody and forget about it.

                
> Allow cores to specify a named config set
> -----------------------------------------
>
>                 Key: SOLR-4478
>                 URL: https://issues.apache.org/jira/browse/SOLR-4478
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.2, 5.0
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>         Attachments: SOLR-4478.patch, SOLR-4478.patch
>
>
> Part of moving forward to "the new way", after SOLR-4196 etc... I propose an additional
parameter specified on the <core> node in solr.xml or as a parameter in the "discovery"
mode core.properties file, call it configSet, where the value provided is a path to a directory,
either absolute or relative. Really, this is as though you copied the conf directory somewhere
to be used by more than one core.
> Straw-man: There will be a directory <solr_home>/configsets which will be the default.
If the configSet parameter is, say, "myconf", then I'd expect a directory named "myconf" to
exist in <solr_home>/configsets, which would look something like
> <solr_home>/configsets/myconf/schema.xml
>                               solrconfig.xml
>                               stopwords.txt
>                               velocity
>                               velocity/query.vm
> etc.
> If multiple cores used the same configSet, schema, solrconfig etc. would all be shared
(i.e. shareSchema="true" would be assumed). I don't see a good use-case for _not_ sharing
schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly
set to false in the solr.xml or properties file? I'd guess it should be honored but maybe
log a warning?
> Mostly I'm putting this up for comments. I know that there are already thoughts about
how this all should work floating around, so before I start any work on this I thought I'd
at least get an idea of whether this is the way people are thinking about going.
> Configset can be either a relative or absolute path, if relative it's assumed to be relative
to <solr_home>.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message