lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "LotsOfCores" by ErickErickson
Date Sun, 28 Oct 2012 17:04:49 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "LotsOfCores" page has been changed by ErickErickson:
http://wiki.apache.org/solr/LotsOfCores?action=diff&rev1=15&rev2=16

Comment:
Started re-working for the patches I'm working on.

- <!> [[Solr4.1]]
+ <!> [[Solr4.1]] (if there's a [[Solr4.0.1] or [[Solr4.2]] if not. 
  
  <<TableOfContents>>
  
@@ -23, +23 @@

     * [[https://issues.apache.org/jira/browse/SOLR-1306|SOLR-1306]] - Support pluggable persistence/loading
of solr.xml details
     * [[https://issues.apache.org/jira/browse/SOLR-1416|SOLR-1416]] - Reduce contention in
CoreContainer#getCore()
     * [[https://issues.apache.org/jira/browse/SOLR-1530|SOLR-1530]] - Open IndexSearcher
lazily
-    * [[https://issues.apache.org/jira/browse/SOLR-1533|SOLR-1533]] - Partition data directories
into multiple "bucket" directories
     * [[https://issues.apache.org/jira/browse/SOLR-3980|SOLR-3980]] - list not loaded (lazily
loaded) cores for clients.
    * Fixed/Closed issues
     * [[https://issues.apache.org/jira/browse/SOLR-880|SOLR-880]] - SolrCore should have
a STOP option and a lazy startup option (part of SOLR-1028)
@@ -33, +32 @@

     * [[https://issues.apache.org/jira/browse/SOLR-1106|SOLR-1106]] - Pluggable CoreAdminHandler
(Action ) architecture that allows for custom handler access to CoreContainer / request-response
     * [[https://issues.apache.org/jira/browse/SOLR-1108|SOLR-1108]] - Remove synchronization
in SolrCore constructor   
     * [[https://issues.apache.org/jira/browse/SOLR-1531|SOLR-1531]] - Provide an option to
remove the data directory on core unload. Already done in [[https://issues.apache.org/jira/browse/SOLR-2610|SOLR-2610]
+    * [[https://issues.apache.org/jira/browse/SOLR-1533|SOLR-1533]] - Partition data directories
into multiple "bucket" directories. Will be handled by SOLR-1306.
   
  
  Other features which may be needed for such a system include:
   * Changes to SolrJ for new start/stop commands and better error codes/messages.
  
  = Configuration =
+ As I'm digging into this, things are changing. What follows is fluid, it may change as this
progresses. Much of this work will show the most benefit if there is a custom CoreDescriptorProvider
in the chain.
+ 
+ There are two new attributes of a <core> tag (defaults in bold) and one new attribute
for <cores>
+  * <cores> has a two new attributes:
+   * swappableCacheSize=[NNN]. If this limit is crossed, old cores marked 'swappable="true"'
are removed to make room on an LRU basis. 
+    * If this is absent, the default is Integer.MAX_VALUE, an unbounded cache. ''Only'' cores
with "swappable=true" are put in this cache, so specifying this attribute without having any
cores marked as "swappable" has no effect, just wastes a LinkedHashMap<String, SolrCore>
of the specified size which will never be used.
+    * Having this size be less than the number of cores marked 'swappable="true"' AND 'loadOnStartup="true"'
''should'' work, but it's wasteful since a bunch of cores will be loaded on startup then immediately
unloaded after the cache fills up.
+    * NOTE: when solr.xml is read, the information for all swappable cores (i.e. the CoreDescriptor)
is put in a separate list. So having more swappable cores than the size of the cache will
be handled correctly. The "list of cores" is unbounded.
+   * coreDescriptorProvider=<class derived from CoreDescriptorProvider>. (more later,
this isn't done yet). This will be a pluggable component that provides a CoreDescriptor on
demand. Solr core handling will ask this component (if present) for a CoreDescriptor for any
core name it doesn't recognize. If the component returns a CoreDescriptor, it will be added
to the appropriate internal list based on the values of the loadOnStartup and swappable member
variables. The code will ''probably'' load the core no matter what the loadOnStartup parameter
specifies on the theory that there is an immediate request to be satisfied. TBD. 
+  * <core> has two new attributes:
+   * loadOnStartup=["'''true'''"|"false"]. Whether the core should be completely loaded upon
startup.
+   * swappable=["true"|"'''false'''"]. Whether the core is allowed to be swapped out or not.
+ 
+ So the idea is that there's really no reason to tie in "lazy loading" with whether the core
can be swapped out or not, so by splitting up the two options we give the user control over
how these are handled. Use cases below:
+  * loadOnStartup=true swappable=false: Current case. Spend all the time necessary to fully
load the cores on startup.
+  * loadOnStartup=true swappable=true:  There are some cores you want loaded when the server
first starts up, but that you'll allow to be swapped out. It's wasteful to specify more cores
like this than your swappableCacheSize value.
+  * loadOnStartup=false swappable=false: Probably the least useful combination, but it naturally
falls out of the code. You'd specify this combination if, for some reason, starting Solr up
quickly was more important than the inconvenience of having to wait randomly for cores to
be loaded when a request was made.
+  * loadOnStartup=false swappable=true: This is really the use-case. There are a large number
of cores in your system that are short-duration use. You want Solr to load them as necessary,
but unload them when the cache gets full on an LRU basis.
+ 
  
  The following configuration applies to the patch given in [[https://issues.apache.org/jira/browse/SOLR-1293|SOLR-1293]].
  
@@ -46, +65 @@

  <?xml version='1.0' encoding='UTF-8'?>
  <solr persistent='true'>
    <cores adminPath="/admin/cores"
-           maxCores="4"
+           swappableCacheSize="4"
            adminHandler="org.apache.solr.handler.admin.LotsOfCoresAdminHandler"
            shareSchema="true"
-           shareConfig="true"
+           shareConfig="true">
-           baseDataDir="/opt/solr/data"
-           numBuckets="4"
-           commonInstanceDir="/opt/solr"
-           cleanOnUnload="true">
-     <core name="core0" instanceDir="/opt/solr" loadOnStart="false"/>
+     <core name="core0" instanceDir="/opt/solr" loadOnStartup="false" swappable="true"/>
    </cores>
  </solr>
  }}}
- == Common Properties ==
-  * '''maxCores''' - Maximum number of cores to be loaded at any given point in time. If
this limit is crossed, the least recently used core is stopped and the new one is started.
-  * '''adminHandler''' - Value should be fixed as in the above example. The adminHandler
is pluggable in Solr now.
-  * '''shareSchema''' - Ensures that only one instance of IndexSchema is created in the Solr
-  * '''shareConfig''' - Ensures that only one instance of SolrConfig is created in the Solr
-  * '''baseDataDir''' - This is the place where the indexes are created. There is no need
to pass the dataDir as an request parameter. Solr automatically assigns a data directory for
 that core in this base directory
-  * '''numBuckets''' - This shows the number of buckets created in 'baseDataDir'. A core
will be assigned into one of the buckets randomly. Keep it '0' or omit this attribute if buckets
are not required
-  * '''commonInstanceDir''' - This can be the default instanceDir for all the cores created.
The 'instanceDir' parameter can be omitted while creating a core if this attribute has been
specified in solr.xml
-  * '''cleanOnUnload''' - Clean up (delete) the index when a core is unloaded.
  
+ == Persistence ==
+ This is a sticky wicket. As currently written, the Solr.xml file has a global 'persist="true|false"'
option. The base problem is maintenance. What I'm currently thinking is that Solr should ''only''
persist cores that were originally defined in solr.xml and should ''not'' persist any core
that was provided by the CoreDescriptorProvider.
  
- With the above configuration, the only parameter required for creating a core is the core
name.
+ == From the original page, under discussion ==
+  * START/STOP commands. Actually, it doesn't seem like there's anything that could be done
with these that isn't accomplished by CREATE/UNLOAD. Perhaps alias START->CREATE and STOP->UNLOAD
with suitable defaults? I.e. STOP would never delete the index.
+  * '''shareSchema''' - Ensures that only one instance of IndexSchema is created in the Solr.
Given the recent additions that allow one to specify a schema file on a per-core basis, does
this make sense any more? 
+  * '''shareConfig''' - Ensures that only one instance of SolrConfig is created in the Solr.
Given the recent additions that allow one to specify a config file on a per-core basis, does
this make sense any more?
+  * '''cleanOnUnload''' - Clean up (delete) the index when a core is unloaded. Not implemented
yet, for my particular use-case it probably won't be. I can see the utility though. Doesn't
seem very hard code-wise.
  
+ Hmmm, haven't thought about the various status commands very deeply.
- == Per-Core Properties ==
-  * '''loadOnStart''' - (boolean true/false)Specifies whether the core should be started
when Solr starts up. This parameter can be passed along while creating a core . 
- 
- = New CoreAdmin Commands =
- 
- LotsOfCoresAdminHandler supports two new core admin commands:
- 
-  * start - If a core is stopped it can be started using this command
-  * stop - if a core is running it can be stopped 
- 
- Example: http://host:80/admin/cores?action=start
- 
  There is an update to the 'status' command. Adding a parameter 'verbose=false' will return
a minimal status report of the cores. The default status command uses Luke on the core's index
to get very detailed information which is expensive if the status is queried very frequently.
  
  = Further work =
@@ -90, +91 @@

   * We highly recommend that the 'alias' feature in Solr not be used due to the high synchronization
overhead it brings.
   * Alternatively, we should work towards reducing the synchronization involved
  
+ = status = 
+ As I mentioned, this is still very fluid. Please feel free to make comments, either on the
dev list, via the JIRAS above etc.
+ 

Mime
View raw message