lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-1293) Support for large no:of cores and faster loading/unloading of cores
Date Mon, 22 Oct 2012 10:58:15 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481296#comment-13481296
] 

Erick Erickson commented on SOLR-1293:
--------------------------------------

Well, I think this JIRA will finally get some action...

Jose: 
The actual availability of any particular feature is best tracked by the actual JIRA ticket.
The "fix version" is usually the earliest _possible_ fix. Not until the resolution is something
like "fixed" is the code really in the code line.

All:
OK, I'm thinking along these lines. I've started implementation, but wanted to open up the
discussion in case I'm going down the wrong path.

Assumption:
1> For installations with multiple thousands of cores, provision has to me made for some
kind of administrative process, probably an RDBMS that really maintains this information.


So here's a brief outline of the approach I'm thinking about.
1> Add an additional optional parameter to the <cores> entry in solr.xml, LRUCacheSize=#.
(what default?)
2> Implement SOLR-1306, allow a data provider to be specified in solr.xml that gives back
core descriptions, something like: <coreDescriptorProvider class="com.foo.FooDataProvider"
[attr="val"]/> (don't quite know what attrs we want, if any).
3> Add two optional attributes to individual <core> entries
   a> sticky="true|false". Default to true. Any cores marked with this would never be aged
out, essentially treat them just as current. 
   b> loadOnStartup="true|false", default to true.
4> so the process of getting a core would be something like
   a> check the normal list, just like now. If a core was found, return it.
   b> Check the LRU list, if a core was found, return it.
   c> ask the dataprovider (if defined) for the core descriptor. create the core and put
it in the LRU list.
   d> remove any core entries over the LRU limit. Any hints on the right cache to use?
There's the Lucene LRUCache, ConcurrentLRUCache, the LRUHashMap in lucene that I can't find
in any of the compiled jars....). I've got to close the core as it's removed.... It _looks_
like I can use ConcurrentLRUCache and add a listener to close the core when it's removed from
the list.

Processing-wise, in the usual case this would cost an extra check each time a core was fetched.
If <a> above failed, we would have to see if the dataprovider was defined before returning
null. I don't think that's onerous, the rest of the costs would only be incurred when a dataprovider
_did_ exist.

But one design decisions here is along these lines. What to do with persistence and stickiness?
Specifically, if the coreDescriptorProvider gives us a core from, say, an RDBMS, should we
allow that core to be persisted into the solr.xml file if they've set persist="true" in solr.xml?
I'm thinking that we can make this all work with maximum flexibility if we allow the coreDataProvider
to tell us whether we should persist any core currently loaded....

Anyway, I'll be fleshing this out over the next little while, anybody want to weigh in?

Erick


                
> Support for large no:of cores and faster loading/unloading of cores
> -------------------------------------------------------------------
>
>                 Key: SOLR-1293
>                 URL: https://issues.apache.org/jira/browse/SOLR-1293
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore
>            Reporter: Noble Paul
>             Fix For: 4.1
>
>         Attachments: SOLR-1293.patch
>
>
> Solr , currently ,is not very suitable for a large no:of homogeneous cores where you
require fast/frequent loading/unloading of cores . usually a core is required to be loaded
just to fire a search query or to just index one document
> The requirements of such a system are.
> * Very efficient loading of cores . Solr cannot afford to read and parse and create Schema,
SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920)
> * START STOP core . Currently it is only possible to unload a core (SOLR-880)
> * Automatic loading of cores . If a core is present and it is not loaded and a request
comes for that load it automatically before serving up a request
> * As there are a large no:of cores , all the cores cannot be kept loaded always. There
has to be an upper limit beyond which we need to unload a few cores (probably the least recently
used ones)
> * Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores'
dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create
in a unix dir w/o affecting performance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message