lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <>
Subject [jira] [Commented] (SOLR-1306) Support pluggable persistence/loading of solr.xml details
Date Tue, 13 Nov 2012 18:02:12 GMT


Erick Erickson commented on SOLR-1306:

Well, the use case here is explicitly that the core information is kept in a completely extra-solr
repository (extra ZK too for that matter). Managing 100K cores by moving directories around
is non-trivial, especially since there will probably be some system-of-record for where all
the information lives anyway.

As it stands, this patch doesn't really affect the way Solr works OOB. It only comes into
play if the people implementing the provider _require_ it (and want to implement the complexity).

But let me think about this a bit. Are you suggesting that the whole notion of solr.xml be
replaced by some kind of crawl/discovery process? Off the top of my head, I can imagine a
degenerate solr.xml that just lists one or more directories. Then the load process consists
of crawling those directories looking for cores and loading them, possibly with some kind
of configuration files at the core level. For the 10s of K cores/machine case we don't want
to put the data in solrconfig.xml or anything like that, I'm thinking of something very much
simpler, on the order of a file. I've skipped thinking about how to "find
a core" or how that plays with using common schemas to see if this is along the lines you're
thinking of "getting meta-data closer to the index".

It does make the whole coordination issue a lot easier, though. You no longer have the loose
coupling between having core information in solr.xml and then having to be sure the files/dirs
corresponding to what's in solr.xml "just happen" to map to what's actually on disk.... Moving
something from one place to another would consist of
1> shutting down the servers
2> moving the core directory from one server to another
3> starting up the servers again.

I can imagine doing this a bit differently...
1> copy the core from one server to another
2> issue an unload for the core on the source server
3> issue a create for the core on the dest server

There'd probably have to be some kind of background loading, but we're already talking about
parallelizing multicore loads...

>From an admin perspective, the poor soul trying to maintain this all could pretty easily
enumerate where all the cores were just by asking each server for a list of where things are.

Anyway, is the in the vicinity of "moving the metadata closer to the index"?
> Support pluggable persistence/loading of solr.xml details
> ---------------------------------------------------------
>                 Key: SOLR-1306
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>            Reporter: Noble Paul
>            Assignee: Erick Erickson
>             Fix For: 4.1
>         Attachments: SOLR-1306.patch, SOLR-1306.patch, SOLR-1306.patch, SOLR-1306.patch
> Persisting and loading details from one xml is fine if the no:of cores are small and
the no:of cores are few/fixed . If there are 10's of thousands of cores in a single box adding
a new core (with persistent=true) becomes very expensive because every core creation has to
write this huge xml. 
> Moreover , there is a good chance that the file gets corrupted and all the cores become
unusable . In that case I would prefer it to be stored in a centralized DB which is backed
up/replicated and all the information is available in a centralized location. 
> We may need to refactor CoreContainer to have a pluggable implementation which can load/persist
the details . The default implementation should write/read from/to solr.xml . And the class
should be pluggable as follows in solr.xml
> {code:xml}
> <solr>
>   <dataProvider class="" attr1="val1" attr2="val2"/>
> </solr>
> {code}
> There will be a new interface (or abstract class ) called SolrDataProvider which this
class must implement

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message