lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrCaching" by JayLuker
Date Fri, 29 Oct 2010 18:22:45 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCaching" page has been changed by JayLuker.
http://wiki.apache.org/solr/SolrCaching?action=diff&rev1=20&rev2=21

--------------------------------------------------

  <<TableOfContents>>
  
  = Overview =
- 
- Solr caches are associated with an Index Searcher &#151; a particular 'view' of the
index that doesn't change. So as long as that Index Searcher is being used, any items in the
cache will be valid and available for reuse. Caching in Solr is unlike ordinary caches in
that Solr cached objects will not expire after a certain period of time; rather, cached objects
will be valid as long as the Index Searcher is valid.
+ Solr caches are associated with an Index Searcher — a particular 'view' of the index that
doesn't change. So as long as that Index Searcher is being used, any items in the cache will
be valid and available for reuse. Caching in Solr is unlike ordinary caches in that Solr cached
objects will not expire after a certain period of time; rather, cached objects will be valid
as long as the Index Searcher is valid.
  
  The ''current'' Index Searcher serves requests and when a ''new'' searcher is opened, the
new one is auto-warmed while the current one is still serving external requests. When the
new one is ready, it will be ''registered'' as the ''current'' searcher and will handle any
new search requests.  The old searcher will be closed after all request it was servicing finish.
  The current Searcher is used as the source of auto-warming. When a new searcher is opened,
its caches may be prepopulated or "autowarmed" using data from caches in the old searcher.
  
- There are currently two cache implementations &#151; solr.search.LRUCache (LRU = Least
Recently Used in memory), and solr.search.FastLRUCache.
+ There are currently two cache implementations — solr.search.LRUCache (LRU = Least Recently
Used in memory), and solr.search.FastLRUCache.
  
  <!> [[Solr1.4]] FastLRUCache is a 1.4 feature
-  
  
  = Common Cache Configuration Parameters =
- 
  Caching configuration is set-up in the Query section of [[SolrConfigXml|solrconfig.xml]].
 For most caches, you can set the following parameters....
  
  == class ==
- 
  The `SolrCache` implementation you wish to use .The available implementations are
+ 
   * `solr.search.LRUCache`
   * `solr.search.FastLRUCache`
  
  <!> [[Solr1.4]] FastLRUCache is a 1.4 feature
+ 
  == size ==
- 
  The maximum number of entries in the cache.
  
  == initialSize ==
- 
  The initial capacity (number of entries) of the cache.  (see `java.util.HashMap`)
  
  == autowarmCount ==
- 
  The number of entries to prepopulate from and old cache.
  
  When a new searcher is opened, its caches may be prepopulated or "autowarmed" with cached
object from caches in the old searcher. autowarmCount is the number of cached items that will
be regenerated in the new searcher. You will proably want to base the autowarmCount setting
on how long it takes to autowarm. You must consider the trade-off — time-to-autowarm versus
how warm (i.e., autowarmCount) you want the cache to be. The autowarm parameter is set for
the caches in solrconfig.xml.
  
- <!> [[Solr4.0]] autowarmCount can now be specified as a percentage (ie: "90%") which
will be evaluated relative the number of items in the existing cache.  This can be an advantageous
setting in an instance of Solr where you don't expect any search traffic (ie a master), but
you want some caches so that if it does take on traffic it won't be too overloaded.  Once
the traffic dies down, subsequent commits will gradually decrease the number of items being
warmed. 
+ <!> [[Solr4.0]] autowarmCount can now be specified as a percentage (ie: "90%") which
will be evaluated relative the number of items in the existing cache.  This can be an advantageous
setting in an instance of Solr where you don't expect any search traffic (ie a master), but
you want some caches so that if it does take on traffic it won't be too overloaded.  Once
the traffic dies down, subsequent commits will gradually decrease the number of items being
warmed.
  
  == minSize (optional) ==
- Only applicable for `FastLRUCache` . After the cache reaches its size, the cache tries to
bring it down to the `minSize`. The default value is `0.9 * size` . 
+ Only applicable for `FastLRUCache` . After the cache reaches its size, the cache tries to
bring it down to the `minSize`. The default value is `0.9 * size` .
  
  == acceptableSize (optional) ==
- Only applicable for `FastLRUCache` . When the cache removes old entries , it targets to
achieve the `minSize`. If not possible it at least tries to bring it down to `acceptableSize`.
 The default value is `0.95 * size`. 
+ Only applicable for `FastLRUCache` . When the cache removes old entries , it targets to
achieve the `minSize`. If not possible it at least tries to bring it down to `acceptableSize`.
 The default value is `0.95 * size`.
- 
  
  == cleanupThread (optional) ==
  Only applicable for `FastLRUCache`. Default is set to false. If set to true, the cleanup
will be run in a dedicated separate thread.  Consider setting this to true for very large
cache sizes, as the cache cleanup (triggered when the cache size reaches the upper water mark)
can take some time.
  
  = Types of Caches and Example Configuration =
- 
  Below we present the cache-specific parts of the solrconfig.xml file and its recommended
settings:
  
  == filterCache ==
- 
  This cache stores '''unordered''' sets of document IDs. This cache can be for three different
purposes:
  
  First, the filter cache stores the results of any filter queries ("fq" parameters) that
Solr is explicitly asked to execute. (Each filter is executed and cached separately. When
it's time to use them to limit the number of results returned by a query, this is done using
set intersections.)
@@ -65, +57 @@

  Finally, the filter cache may be used for sorting if the <useFilterForSortedQuery/>
config option is set to true in solfconfig.xml.
  
  If you use faceting with the fieldCache method (see SolrFacetingOverview), it is recommended
that you set the filterCache size to be greater than the number of unique values in all of
your faceted fields.
+ 
  {{{
      <!-- Internal cache used by SolrIndexSearcher for filters (DocSets),
           unordered sets of *all* documents that match a query.
@@ -79, +72 @@

        initialSize="4096"
        autowarmCount="4096"/>
  }}}
- 
  == queryResultCache ==
- 
- This cache stores '''ordered''' sets of document IDs &#151; results of a query ordered
by some criteria.
+ This cache stores '''ordered''' sets of document IDs — results of a query ordered by some
criteria.
  
  The memory usage for the queryResultCache is significantly less than that of the filterCache
because it only stores document IDs that were returned to the user by the query.
+ 
  {{{
      <!-- queryResultCache caches results of searches - ordered lists of
           document ids (DocList) based on a query, a sort, and the range
@@ -96, +88 @@

        initialSize="4096"
        autowarmCount="1024"/>
  }}}
- 
  == documentCache ==
- 
  The documentCache stores Lucene Document objects that have been fetched from disk.
  
  The size for the documentCache should always be greater than <max_results> * <max_concurrent_queries>,
to ensure that Solr does not need to refetch a document during a request. The more fields
you store in your documents, the higher the memory usage of this cache will be.
  
+ Each Document object in the documentCache contains a List of Field references.  When enableLazyFieldLoading=true
is set, and there is a documentCache, Document objects fetched from the IndexReader will only
contain the Fields specified in the fl. All other Fields will be marked as "LOAD_LAZY". When
there is a cache hit on that uniqueKey at a later date, the Fields allready loaded are used
directly (if requested), but the Fields marked LOAD_LAZY will lazy loaded from the IndexReader
and then the Document object updates the references to the newly actualized fields (which
are no longer marked LOAD_LAZY). So with different "fl" params, the same Document Object can
be re-used from the documentCache, but the Fields in that Document grow as the fields requested
(using the "fl" param) change.
+ 
  ''(Note: This cache cannot be used as a source for autowarming because document IDs will
change when anything in the index changes so they can't be used by a new searcher.)''
+ 
  {{{
      <!-- documentCache caches Lucene Document objects (the stored fields for each document).
        -->
@@ -112, +105 @@

        size="16384"
        initialSize="16384"/>
  }}}
-  
  == User/Generic Caches ==
- 
  Users who have written custom Solr plugins for their applications can configure generic
object caches which Solr will maintain and optionally autowarm if a custom regenerator is
specified.
  
  {{{
@@ -133, +124 @@

        regenerator="org.foo.bar.YourRegenerator"/>
      -->
  }}}
- 
  A new cache calls a '''regenerator''' to re-populate or pre-populate the last ''n'' objects
from the old cache into the new cache." (A new cache is created by a new Index Searcher.)
  
  You can specify a regenerator for any of the cache types here, but !SolrIndexSearcher itself
specifies the regenerators that Solr uses internally.
  
  == The Lucene FieldCache ==
- 
- Lucene has a low level "Field``Cache" which is used for sorting (and in some cases faceting).
 This cache is not managed by Solr it has no configuration options and cannot be autowarmed
-- it is initialized the first time it is used for each Searcher.
+ Lucene has a low level "FieldCache" which is used for sorting (and in some cases faceting).
 This cache is not managed by Solr it has no configuration options and cannot be autowarmed
-- it is initialized the first time it is used for each Searcher.
  
- See below for ways you can "explicitly warm" the Field``Cache using newSearcher and firstSearcher
event listeners.
+ See below for ways you can "explicitly warm" the FieldCache using newSearcher and firstSearcher
event listeners.
  
  = Other Cache-relevant Settings =
- 
  == newSearcher and firstSearcher Event Listeners ==
- 
  A firstSearcher event is fired whenever a new searcher is being prepared but there is no
current registered searcher to handle requests or to gain autowarming data from (ie: on Solr
startup).  A newSearcher event is fired whenever a new searcher is being prepared and there
is a current searcher handling requests (aka registered).
  
- In both cases, a Solr``Event``Listener (like the Query``Sender``Listener) may be configured
in the [[SolrConfigXml|solrconfig.xml]] file -- This is particularly useful to "explicitly
warm" caches up with common queries on startup, and to forcibly create the Field``Cache for
common sort fields when new searchers are opened...
+ In both cases, a SolrEventListener (like the QuerySenderListener) may be configured in the
[[SolrConfigXml|solrconfig.xml]] file -- This is particularly useful to "explicitly warm"
caches up with common queries on startup, and to forcibly create the FieldCache for common
sort fields when new searchers are opened...
  
  {{{
      <listener event="newSearcher" class="solr.QuerySenderListener">
@@ -164, +151 @@

          <!-- seed common sort fields -->
          <lst> <str name="q">anything</str> <str name="sort">name
desc, price desc, populartiy desc</str> </lst>
          <!-- seed common facets and filter queries -->
-         <lst> <str name="q">anything</str> 
+         <lst> <str name="q">anything</str>
-               <str name="facet.field">category</str> 
+               <str name="facet.field">category</str>
                <str name="fq">inStock:true</str>
                <str name="fq">price:[0 TO 100]</str>
          </lst>
        </arr>
      </listener>
  }}}
- 
  == useFilterForSortedQuery ==
- 
  If the filterCache is not enabled, this setting is ignored, but performance ''may'' be impacted
if true or false. You may want to try both settings.
+ 
  {{{
      <!-- An optimization that attempts to use a filter to satisfy a search.
           If the requested sort does not include score, then the filterCache
@@ -185, +171 @@

        -->
     <useFilterForSortedQuery>true</useFilterForSortedQuery>
  }}}
- 
  == queryResultWindowSize ==
- 
  Rounds-up a request number to the nearest multiple of the setting, thereby storing a range
or window of documents to be quickly available.
+ 
  {{{
      <!-- An optimization for use with the queryResultCache.  When a search
           is requested, a superset of the requested number of document ids
@@ -199, +184 @@

      -->
      <queryResultWindowSize>50</queryResultWindowSize>
  }}}
- 
  == The hashDocSet Max Size ==
- 
- The hashDocSet is an optimization that enables an int hash representation for filters (docSets)
when the number of items in the set is less than maxSize.  For smaller sets, this representation
is more memory efficient, more efficient to iterate, and faster to take intersections. 
+ The hashDocSet is an optimization that enables an int hash representation for filters (docSets)
when the number of items in the set is less than maxSize.  For smaller sets, this representation
is more memory efficient, more efficient to iterate, and faster to take intersections.
+ 
  {{{
      <!-- This entry enables an int hash representation for filters (DocSets)
           when the number of items in the set is less than maxSize.  For smaller
@@ -211, +195 @@

      -->
      <HashDocSet maxSize="3000" loadFactor="0.75"/>
  }}}
- 
- The hashDocSet max size should be based primarliy on the number of documents in the collection&#151;the
larger the number of documents, the larger the hashDocSet max size. You will have to do a
bit of trial-and-error to arrive at the optimal number:
+ The hashDocSet max size should be based primarliy on the number of documents in the collection—the
larger the number of documents, the larger the hashDocSet max size. You will have to do a
bit of trial-and-error to arrive at the optimal number:
+ 
-    1. Calulate 0.005 of the total number of documents that you are going to store.
+  1. Calulate 0.005 of the total number of documents that you are going to store.
-    1. Try values on either 'side' of that value to arrive at the best query times. 
+  1. Try values on either 'side' of that value to arrive at the best query times.
-    1. When query times seem to plateau, and performance doesn't show much difference between
the higher number and the lower, use the higher.
+  1. When query times seem to plateau, and performance doesn't show much difference between
the higher number and the lower, use the higher.
  
  Note: hashDocSet is no longer part of Solr as of version 1.4.0, see [[https://issues.apache.org/jira/browse/SOLR-1169|SOLR-1169]].
  
  = Tradeoffs =
- 
  Increasing autoWarming values will cause additional latency due to auto-warming from the
time that you request a new searcher to be opened until the time that it becomes "registered".
  
  = Caching and Distribution/Replication =
-  
  Distribution/Replication gives you a 'new' index on the slave. When Solr is told to use
the new index, the old caches have to be discarded along with the old Index Searcher. That's
when autowarming occurs.
  
- If the current Index Searcher is serving requests and when a new searcher is opened, the
new one is 'warmed' while the current one is serving external requests. When the new one is
ready, it is registered so it can serve any new requests while the original one first finishes
the requests it is handling. 
+ If the current Index Searcher is serving requests and when a new searcher is opened, the
new one is 'warmed' while the current one is serving external requests. When the new one is
ready, it is registered so it can serve any new requests while the original one first finishes
the requests it is handling.
-  
+ 
  = Disabling Caching =
- 
- Caching helps only if you are hitting cached objects more than once. If that is not the
case the system is wasting cycles and memory, and you might consider disabling caching by
commenting-out the caching sections in your [[SolrConfigXml|solrconfig.xml]]. 
+ Caching helps only if you are hitting cached objects more than once. If that is not the
case the system is wasting cycles and memory, and you might consider disabling caching by
commenting-out the caching sections in your [[SolrConfigXml|solrconfig.xml]].
  

Mime
View raw message