lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <>
Subject [jira] [Commented] (SOLR-4083) Deprecate specifying individual <core> information in solr.xml. Possibly deprecate solr.xml entirely
Date Wed, 28 Nov 2012 01:14:58 GMT


Erick Erickson commented on SOLR-4083:

Looking for some advice here. The problem is that parsing the solr.xml file in
is so intimately tied in to the fact that it's xml through the use of the Config class. It
starts out simply in initialize, but then tendrils of XML-ese (XPath) is scattered all over
the file.

So, I've thought of a few options.
1> I could be really slimy and, in initialize instead of using the canned solr.xml (DEF_SOLR_XML),
I could _construct_ a new string that looked just like a really big solr.xml file from all
the cores that are discovered and pass _that_ in just as things are done now. But I think
I'd have to wash my hands afterwards. That doesn't get us any forwarder in terms of obsoleting
solr.xml. It'd be fast though with minimal code disruption. But it'd build an XML file just
to parse it into the DOM. Wasteful.

2> Abstract all of the current XPath/XML stuff specific in CoreContainer into a thunking
layer that understood both ways of looking at the world. If it was initialized from a current
solr.xml, it'd just pass all the stuff right through to the current Config. If it was populated
by discovery, resolve the request "natively". So, for instance, a call like
cfg.getInt("solr/cores/@swappableCacheSize", Integer.MAX_VALUE)
in CoreContainer would be replaced by 
newcfg.getCoresInt("swappableCacheSize", Integer.MAX_VALUE)

Under the covers, this would resolve to something like
if (initialized from discovery) return newcfg.coresprops.getInt("swappableCacheSize", Integer.MAX_VALUE);
else return oldcfg.getInt("solr/cores/@swappableCacheSize", Integer.MAX_VALUE);

There'd be a newcfg.solr.getget### for things defined in the <solr ....> tag etc.

3> Take my current notion of a pluggable CoreDescriptorProvider and just make it not pluggable.
Populate it up front with the discovery process and go from there. This seems more consistent
with the ZK CoreDescriptorProvider that's already there.

4> ?????

I _think_ the whole question of whether config files live in a central directory and the rest
of this discussion is orthogonal to this issue. There's really two questions here I guess.
a> is the trouble/complexity of a thunking layer worth the effort? It'll go a ways towards
separating out the requirement of XML parsing for solr.xml at a cost of that complexity. Not
sure it's a good call.
b> Is there another approach that I'm overlooking?

Of the three, I'm torn between the <2> and <3>, I can argue either way. <1>
seems really hacky.
> Deprecate specifying individual <core> information in solr.xml. Possibly deprecate
solr.xml entirely
> ----------------------------------------------------------------------------------------------------
>                 Key: SOLR-4083
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 4.1, 5.0
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
> Spinoff from SOLR-1306. Having a solr.xml file is limiting and possibly unnecessary.
We'd gain flexibility by having an "auto-discovery", essentially walking the directories and
finding all the cores and just loading them.
> Here's an issue to start the discussion of what that would look like. At this point the
way I'm thinking about it depends on SOLR-1306, which depends on SOLR-1028, so the chain is
getting kind of long.
> Straw-man proposal:
> 1> system properties can be specified as root paths in the solr tree to start discovery.
> 2> the directory walking process will stop going deep (but not wide) in the directories
whenever a file is encountered. That file can contain any of the properties
currently specifiable in a <core> tag. This allows, for instance, re-use of a single
solrconfig.xml or schema.xml file across multiple cores. I really dont want to get into having
cores-within-cores. While this latter is possible, I don't see any advantage. You _can_ have
multiple roots and there's _no_ requirement that the cores be in the directory immediately
below that root they can be arbitrarily deep.
> 3> I'm not quite sure what to do with the various properties in the <cores>
tag. Perhaps just require these to be system properties?
> 4> Notice the title. Does it still make sense to specify <3> in solr.xml but
ignore the cores stuff? It seems like so little information will be in solr.xml if we take
all the <core> tags out that we should just kill it all together.
> 5> Not quite sure what this means for _where_ the cores live. Is it arbitrary? Anywyere
on disk? Why not?
> 6> core swapping/renaming/whatever. Really, this is about how we model persist="true"
on solr.xml. It's easy if we keep solr.xml and just remove the individual core entries. Where
to put them?
> 7> _if_ we're supposed to persist core admin operations, it seems like we just persist
this stuff to the individual files. Things like whether it's loaded, whether
its name has changed (1028 allows lazy loading).
> 8> This still provide the capability of your own custom CoreDescriptorProvider, which
you'll have to specify somehow. I'm not quite sure where yet.
> solr.xml is really the bootstrap for the whole shootin' match. Removing it entirely means
we have to specify root directories, zk parameters, whatever somehow. What do people think
is the best option here? Leave a degenerate solr.xml? Require system properties be set for
any of these options? Currently, the options we'll need are anything (actual or proposed)
in the <solr> and <cores> tags.
> So, what the first cut at this would be, building on 1306, is a default CoreDescriptorProvider
that ignored all the <core> entries in solr.xml, walked the tree and loaded all the
cores found. I claim this is a quick thing to PoC assuming SOLR-1306 and I'll try to provide
a patch demonstrating it over the weekend.
> But mostly, this is a place to start the discussion about what this would look like rather
than have it get lost in SOLR-1306.
> finally, note that I have no intention of putting any of this into 4.x at least until
we cut the 4.1/4.0.1 whatever.
> And, of course, until we fully deprecate solr.xml (5.0?) the current behavior will be
the default.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message