Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Date: Thu, 15 Nov 2012 21:33:12 +0000 (UTC)
From: "Erick Erickson (JIRA)" <jira@apache.org>
To: dev@lucene.apache.org
Message-ID: <1917083146.121118.1353015192458.JavaMail.jiratomcat@arcas>
In-Reply-To: <579081846.119221.1352988733178.JavaMail.jiratomcat@arcas>
Subject: [jira] [Commented] (SOLR-4083) Deprecate specifying individual
 <core> information in solr.xml. Possibly deprecate solr.xml entirely
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/SOLR-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498356#comment-13498356 ] 

Erick Erickson commented on SOLR-4083:
--------------------------------------

bq: I'm also not sure why you would load this many cores at once - I thought the whole point was that a subset would be loaded and the rest on demand/lazily 

I'm not thinking about loading them at all, just getting the information about them (really creating a CoreDescriptor for each I think). Say a request comes in for core Z. I'm assuming that putting all 10K cores in the same place (i.e. 15K core directories under <solr_home> is not acceptable if for no other reason than keeping them organized, as well as the 10K subdirs in a single dir problem. We'd want to allow an arbitrarily deep tree. I was assuming that it would be easiest to generate a map of cores->directories to be able to autoload the cores (and cap the number of currently open cores) given all we have to work with at that point is the core name. So I'm thinking of generating that map at startup. Ditto for getting the list of all available cores etc.

FWIW, I played with writing 15K config files out and the program to do that is pretty fast w/o any optimizations. Trying read next. Of course that's on my SSD; I'll give it a whirl on my old spinning-disk laptop tonight or tomorrow. And I can imagine throwing a bunch of threads at the issue if necessary (which I'm not yet convinced it is).
                
> Deprecate specifying individual <core> information in solr.xml. Possibly deprecate solr.xml entirely
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4083
>                 URL: https://issues.apache.org/jira/browse/SOLR-4083
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 4.1, 5.0
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>
> Spinoff from SOLR-1306. Having a solr.xml file is limiting and possibly unnecessary. We'd gain flexibility by having an "auto-discovery", essentially walking the directories and finding all the cores and just loading them.
> Here's an issue to start the discussion of what that would look like. At this point the way I'm thinking about it depends on SOLR-1306, which depends on SOLR-1028, so the chain is getting kind of long.
> Straw-man proposal:
> 1> system properties can be specified as root paths in the solr tree to start discovery.
> 2> the directory walking process will stop going deep (but not wide) in the directories whenever a solrcore.properties file is encountered. That file can contain any of the properties currently specifiable in a <core> tag. This allows, for instance, re-use of a single solrconfig.xml or schema.xml file across multiple cores. I really dont want to get into having cores-within-cores. While this latter is possible, I don't see any advantage. You _can_ have multiple roots and there's _no_ requirement that the cores be in the directory immediately below that root they can be arbitrarily deep.
> 3> I'm not quite sure what to do with the various properties in the <cores> tag. Perhaps just require these to be system properties?
> 4> Notice the title. Does it still make sense to specify <3> in solr.xml but ignore the cores stuff? It seems like so little information will be in solr.xml if we take all the <core> tags out that we should just kill it all together.
> 5> Not quite sure what this means for _where_ the cores live. Is it arbitrary? Anywyere on disk? Why not?
> 6> core swapping/renaming/whatever. Really, this is about how we model persist="true" on solr.xml. It's easy if we keep solr.xml and just remove the individual core entries. Where to put them?
> 7> _if_ we're supposed to persist core admin operations, it seems like we just persist this stuff to the individual solrcore.properties files. Things like whether it's loaded, whether its name has changed (1028 allows lazy loading).
> 8> This still provide the capability of your own custom CoreDescriptorProvider, which you'll have to specify somehow. I'm not quite sure where yet.
> solr.xml is really the bootstrap for the whole shootin' match. Removing it entirely means we have to specify root directories, zk parameters, whatever somehow. What do people think is the best option here? Leave a degenerate solr.xml? Require system properties be set for any of these options? Currently, the options we'll need are anything (actual or proposed) in the <solr> and <cores> tags.
> So, what the first cut at this would be, building on 1306, is a default CoreDescriptorProvider that ignored all the <core> entries in solr.xml, walked the tree and loaded all the cores found. I claim this is a quick thing to PoC assuming SOLR-1306 and I'll try to provide a patch demonstrating it over the weekend.
> But mostly, this is a place to start the discussion about what this would look like rather than have it get lost in SOLR-1306.
> finally, note that I have no intention of putting any of this into 4.x at least until we cut the 4.1/4.0.1 whatever.
> And, of course, until we fully deprecate solr.xml (5.0?) the current behavior will be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org