lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrConfigXml" by BryanTalbot
Date Fri, 06 Nov 2009 22:50:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrConfigXml" page has been changed by BryanTalbot.
http://wiki.apache.org/solr/SolrConfigXml?action=diff&rev1=30&rev2=31

--------------------------------------------------

  = solrconfig.xml =
- 
- solrconfig.xml is the file that contains most of the parameters for configuring Solr itself.

+ solrconfig.xml is the file that contains most of the parameters for configuring Solr itself.
  
  A [[http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml|sample
solrconfig.xml with comments]] can be found in the Source Repository.
  
  /!\ :TODO: /!\ Still need a section explaining indexDefaults
  
- 
  <<TableOfContents>>
  
  == dataDir parameter ==
- 
  Used to specify an alternate directory to hold all index data other than the default ./data
under the Solr home. If replication is in use, this should match the replication configuration.
 If this directory is not absolute, then it is relative to the current working directory of
the servlet container.
  
  {{{
    <dataDir>/var/data/solr</dataDir>
  }}}
- 
  == indexDefaults Section ==
- 
  /!\ :TODO: /!\ Still need a section explaining indexDefaults
  
  == mainIndex Section ==
- 
  The values in this section controls the merging of multiple index segments. See the `mergeFactor`
Considerations section on the SolrPerformanceFactors doc for more details.
+ 
  {{{
     <mainIndex>
      <!-- lucene options specific to the main on-disk lucene index -->
@@ -39, +34 @@

  The `maxMergeDocs` parameter tells Lucene to not to allow any segment to contain more docs
than the value stipulated, but to create a new segment instead.
  
  == Update Handler Section ==
- 
  The Update Handler section mostly relates to low level information about how updates are
handled internally (do not confuse with higher level configuration of "Request Handlers" for
dealing with updates sent by clients)
  
  {{{
  <updateHandler class="solr.DirectUpdateHandler2">
  
      <!-- Limit the number of deletions Solr will buffer during doc updating.
-         
+ 
          Setting this lower can help bound memory use during indexing.
      -->
      <maxPendingDeletes>100000</maxPendingDeletes>
  
      <!-- autocommit pending docs if certain criteria are met.  Future versions may expand
the available
       criteria -->
-     <autoCommit>  
+     <autoCommit>
        <maxDocs>10000</maxDocs> <!-- maximum uncommited docs before autocommit
triggered -->
        <maxTime>86000</maxTime> <!-- maximum time (in MS) after adding a doc
before an autocommit is triggered -->
      </autoCommit>
  
      ...
  }}}
- 
- 
  === "Update" Related Event Listeners ===
- 
  Within the Update Handler Section, you can define listeners for particular "update" related
events: "postCommit" and "postOptimize".  Listeners can be used to fire-off any special code;
they are typically used to exec snapshooter.
  
  {{{
@@ -72, +63 @@

      <!-- The RunExecutableListener executes an external command.
           exe  - the name of the executable to run
           dir  -  dir to use as the current working directory. default="."
-          wait - the calling thread waits until the executable returns. 
+          wait - the calling thread waits until the executable returns.
                  default="true"
           args - the arguments to pass to the program.  default=nothing
           env  - environment variables to set.  default=nothing
@@ -90, +81 @@

      </listener>
    </updateHandler>
  }}}
-  
  == The Query Section ==
- 
  Controls everything query-related.
  
  {{{
    <query>
-     <!-- Maximum number of clauses in a boolean query... can affect range 
+     <!-- Maximum number of clauses in a boolean query... can affect range
-          or wildcard queries that expand to big boolean queries.  
+          or wildcard queries that expand to big boolean queries.
           An exception is thrown if exceeded.
      -->
      <maxBooleanClauses>1024</maxBooleanClauses>
  }}}
-  
  === Caching Section ===
- 
  You can change these caching parameters as your index grows and changes. See the   SolrCaching
page for details on configuring the caches.
  
  {{{
- 
- 
      <!-- Cache used by SolrIndexSearcher for filters (DocSets),
           unordered sets of *all* documents that match a query.
           When a new searcher is opened, its caches may be prepopulated
@@ -164, +149 @@

      -->
  
      <!-- An optimization that attempts to use a filter to satisfy a search.
-          If the requested sort does not include a score, then the filterCache 
+          If the requested sort does not include a score, then the filterCache
-          will be checked for a filter matching the query.  If found, the filter 
+          will be checked for a filter matching the query.  If found, the filter
-          will be used as the source of document ids, and then the sort will be 
+          will be used as the source of document ids, and then the sort will be
           applied to that.
        -->
      <useFilterForSortedQuery>true</useFilterForSortedQuery>
@@ -174, +159 @@

      <!-- An optimization for use with the queryResultCache.  When a search
           is requested, a superset of the requested number of document ids
           are collected.  For example, of a search for a particular query
-          requests matching documents 10 through 19, and queryWindowSize is 50, 
+          requests matching documents 10 through 19, and queryWindowSize is 50,
-          then documents 0 through 50 will be collected and cached. Any further 
+          then documents 0 through 50 will be collected and cached. Any further
           requests in that range can be satisfied via the cache.
      -->
      <queryResultWindowSize>50</queryResultWindowSize>
  
-     <!-- This entry enables an int hash representation for filters (DocSets) 
+     <!-- This entry enables an int hash representation for filters (DocSets)
-          when the number of items in the set is less than maxSize. For smaller 
+          when the number of items in the set is less than maxSize. For smaller
-          sets, this representation is more memory efficient, more efficient to 
+          sets, this representation is more memory efficient, more efficient to
           iterate over, and faster to take intersections.
       -->
      <HashDocSet maxSize="3000" loadFactor="0.75"/>
@@ -194, +179 @@

      -->
      <boolTofilterOptimizer enabled="true" cacheSize="32" threshold=".05"/>
  
-     <!-- Lazy field loading will attempt to read only parts of documents on disk that
are 
+     <!-- Lazy field loading will attempt to read only parts of documents on disk that
are
           requested.  Enabling should be faster if you aren't retrieving all stored fields.
      -->
      <enableLazyFieldLoading>false</enableLazyFieldLoading>
  }}}
- 
  === "Query" Related Event Listeners ===
- 
- Withing the Query section, you can define listeners for particular "query" related events
&#8212; listeners can be used to fire-off special code &#8212; such as invoking some
common queries to warm-up caches.
+ Withing the Query section, you can define listeners for particular "query" related events
— listeners can be used to fire-off special code — such as invoking some common queries
to warm-up caches.
  
  ==== newSearcher ====
- 
- A New Searcher is opened when a (current) Searcher already exists. In the example below,
the listener is of the class, !QuerySenderListener, which takes lists of queries and sends
them to the new searcher being opened, thereby warming it.  
+ A New Searcher is opened when a (current) Searcher already exists. In the example below,
the listener is of the class, !QuerySenderListener, which takes lists of queries and sends
them to the new searcher being opened, thereby warming it.
  
  {{{
      <!-- a newSearcher event is fired whenever a new searcher is being
-          prepared and there is a current searcher handling requests 
+          prepared and there is a current searcher handling requests
-          (aka registered). 
+          (aka registered).
       -->
-     <!-- QuerySenderListener takes an array of NamedList and 
+     <!-- QuerySenderListener takes an array of NamedList and
           executes a local query request for each NamedList in sequence.
       -->
      <!--
      <listener event="newSearcher" class="solr.QuerySenderListener">
        <arr name="queries">
-         <lst> <str name="q">solr</str> 
+         <lst> <str name="q">solr</str>
                <str name="start">0</str>
-               <str name="rows">10</str> 
+               <str name="rows">10</str>
          </lst>
-         <lst> <str name="q">rocks</str> 
+         <lst> <str name="q">rocks</str>
                <str name="start">0</str>
-               <str name="rows">10</str> 
+               <str name="rows">10</str>
          </lst>
        </arr>
      -->
  }}}
- 
  ==== firstSearcher ====
- 
  A First Searcher is opened when there is _no_ existing (current) Searcher. In the example
below, the listener is of the class, !QuerySenderListener, which takes lists of queries and
sends them to the new searcher being opened, thereby warming it. (If there is no Searcher,
you cannot use auto-warming because auto-warming requires an existing Searcher.)
  
  {{{
      <!-- a firstSearcher event is fired whenever a new searcher is being
-          prepared but there is no current registered searcher to handle 
+          prepared but there is no current registered searcher to handle
           requests or to gain prewarming data from.
       -->
      <!--
      <listener event="firstSearcher" class="solr.QuerySenderListener">
        <arr name="queries">
-         <lst> <str name="q">fast_warm</str> 
+         <lst> <str name="q">fast_warm</str>
                <str name="start">0</str>
                <str name="rows">10</str>
          </lst>
@@ -252, +232 @@

      -->
    </query>
  }}}
- 
  == Request Dispatcher ==
- 
  The Request Dispatcher section is the only section of the solrconfig.xml to be used directly
by Solr's HTTP RequestDispatcher to configure how it should deal with various aspects of HTTP
requests, including wether it should handle "/select" urls (for Solr 1.1 compatibility); HTTP
Request Parsing; remote streaming support; max multipart file upload size, etc...
+ 
  {{{
-   <!-- 
+   <!--
      Let the dispatch filter handler /select?qt=XXX
      handleSelect=true will use consistent error handling for /select and /update
      handleSelect=false will use solr1.1 style error formatting
@@ -267, +246 @@

      <requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="2048" />
  
    ...
+ }}}
- }}}     
- 
  <<Anchor(HTTPCaching)>>
+ 
  === HTTP Caching ===
- 
  <!> [[Solr1.3]]
  
  Within the main Request Dipatcher section, the HTTP Caching subsection contains configuration
options for controlling how the Solr Request Dispatcher responds to HTTP Requests that include
cache validation headers, and what kinds of responses Solr will generate.  More information
can be found in [[SolrAndHTTPCaches]]
@@ -279, +257 @@

  {{{
     ...
      <!-- Set HTTP caching related parameters (for proxy caches and clients).
-           
+ 
           To get the behaviour of Solr 1.2 (ie: no caching related headers)
           use the never304="true" option and do not specify a value for
           <cacheControl>
@@ -293, +271 @@

              You can change it to lastModFrom="dirLastMod" if you want the
              value to exactly corrispond to when the physical index was last
              modified.
-                
+ 
              etagSeed="..." is an option you can change to force the ETag
              header (and validation against If-None-Match requests) to be
              differnet even if the index has not changed (ie: when making
@@ -305, +283 @@

         <!-- If you include a <cacheControl> directive, it will be used to
              generate a Cache-Control header, as well as an Expires header
              if the value contains "max-age="
-                
+ 
              By default, no Cache-Control header is generated.
  
              You can use the <cacheControl> option even if you have set
@@ -315, +293 @@

      </httpCaching>
    </requestDispatcher>
  }}}
- 
- The value for ''max-age'' should be set based on how often your index changes and how long
your application can live with an outdated cached response.  To force a shared (or browser)
cache to recheck that the cached response is still valid you can add the parameter ''must-revalidate''
to the Cache-Control header. According to the W3C specification ''max-age'' should not be
higher than 31536000 (1 year). 
+ The value for ''max-age'' should be set based on how often your index changes and how long
your application can live with an outdated cached response.  To force a shared (or browser)
cache to recheck that the cached response is still valid you can add the parameter ''must-revalidate''
to the Cache-Control header. According to the W3C specification ''max-age'' should not be
higher than 31536000 (1 year).
  
  == Request Handler Plug-in Section ==
- 
  /!\ :TODO: /!\ Add details on supplying init parameters.
  
  This is where multiple request handlers can be registered.
  
  {{{
- <!-- requestHandler plugins... incoming queries will be dispatched to 
+ <!-- requestHandler plugins... incoming queries will be dispatched to
-      the correct handler based on the qt (query type) param matching the 
+      the correct handler based on the qt (query type) param matching the
-      name of registered handlers. The "standard" request handler is the 
+      name of registered handlers. The "standard" request handler is the
       default and will be used if qt is not specified in the request.
    -->
    <requestHandler name="standard" class="solr.StandardRequestHandler" />
-   <requestHandler name="custom" class="your.package.CustomRequestHandler" />  
+   <requestHandler name="custom" class="your.package.CustomRequestHandler" />
  }}}
-  
  == The Highlighter plugin configuration section ==
- 
  You can configure custom fragmenters and formatters for use in highlighting.  Each may be
configured with its own set of highlighting parameter defaults.  See also HighlightingParameters.
  
  {{{
@@ -354, +328 @@

        <!-- slightly smaller fragsizes work better because of slop -->
        <int name="hl.fragsize">70</int>
        <!-- allow 50% slop on fragment sizes -->
-       <float name="hl.regex.slop">0.5</float> 
+       <float name="hl.regex.slop">0.5</float>
        <!-- a basic sentence pattern -->
        <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
      </lst>
     </fragmenter>
-    
+ 
     <!-- Configure the standard formatter -->
     <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true">
      <lst name="defaults">
@@ -369, +343 @@

     </formatter>
    </highlighting>
  }}}
- 
  == The Admin/GUI Section ==
- 
  This section handles the administration web page.
  
- Defines “Gettable” files &#151; allows the defined files to be accessed through
the web interface. Also specifies the default search to be filed in on the admin form.
+ Defines “Gettable” files — allows the defined files to be accessed through the web
interface. Also specifies the default search to be filed in on the admin form.
  
  The `<pingQuery>` defines what the "ping" query should be for monitoring the health
of the Solr server.  The URL of the "ping" query is '''/admin/ping'''.  It can be used by
a load balancer in front of a set of Solr servers to check response time of all the Solr servers
in order to do response time based load balancing.
  
  Including the optional `<healthcheck type="file">` will add a '''ENABLE'''/'''DISABLE'''
link on the administration web page which can be used to create/remove a file at the path
defined by the value of `<healthcheck>`.  A load balancer in front of a set of Solr
servers can then query an URL mapped to this file to determine when it should keep/add/remove
a server from rotation.
+ 
  {{{
  <admin>
      <defaultQuery>solr</defaultQuery>
      <gettableFiles>
           solrconfig.xml
           schema.xml
-     </gettableFiles> 
+     </gettableFiles>
      <pingQuery>q=solr&amp;version=2.0&amp;start=0&amp;rows=0</pingQuery>
  
      <!-- configure a healthcheck file for servers behind a loadbalancer
@@ -393, +366 @@

      <healthcheck type="file">server-enabled</healthcheck>
    </admin>
  }}}
- 
  == System property substitution ==
  Solr supports system property substitution, allowing the launching JVM to specify string
substitutions within either of Solr's configuration files.  The syntax {{{${property[:default
value]}}}}.  Substitutions are valid in any element or attribute text.  Here's an example
of allowing the runtime to dictate the data directory:
  
  {{{
     <dataDir>${solr.data.dir:./solr/data}</dataDir>
  }}}
- 
  And using the example application, Solr could be launched in this manner:
  
  {{{
  java -Dsolr.data.dir=/data/dir -jar start.jar
  }}}
- 
  If no default value is provided, the system property MUST be specified otherwise a Solr
startup failure occurs indicating what property has no value.
  
- <!> [[Solr1.4]]
- All the properties which need to be substituted can be put into a properties file and can
be put into the <solr.home>/conf/solrcore.properties. 
+ <!> [[Solr1.4]] All the properties which need to be substituted can be put into a
properties file and can be put into the <solr.home>/conf/solrcore.properties.  example
:
- example :
+ 
  {{{
  #solrcore.properties
  data.dir=/data/solrindex
  }}}
- 
  and in the solrconfig.xml it can be used as follows
+ 
  {{{
     <dataDir>${data.dir}</dataDir>
  }}}
- 
  == Enable/disable components ==
  <!> [[Solr1.4]]
  
- Every component can have an extra attribute enable which can be set as true/false. 
+ Every component can have an extra attribute enable which can be set as true/false.
  
  example:
+ 
  {{{
  <requestHandler name="/replication" class="solr.ReplicationHandler" enable="{enable.replication:true}">
  
@@ -435, +404 @@

  }}}
  here the value of 'enable.replication' can be provided from outside and it can be enabled/disabled
at runtime
  
+ == XInclude ==
+ <!> [[Solr1.4]]
+ 
+ Portions of the solrconfig.xml file can be externalized and included using a standard xml
feature called XInclude.  The easy to read specification for XInclude can be found at http://www.w3.org/TR/xinclude/

+ 
+ This might be useful in cases where there are a number of settings that only differ if the
server is a master or a slave.  These settings could be factored out into two separate files
and then included as needed to avoid duplication of common portions of solrconfig.xml 
+ 
+ The following sample attempts to include solrconfig_master.xml.  If it doesn't exist or
can't be loaded, then solrconfig_slave.xml will be used instead.  If solrconfig_slave.xml
can't be loaded an XML parsing exception will result.
+ {{{
+     <xi:include href="solr/conf/solrconfig_master.xml" xmlns:xi="http://www.w3.org/2001/XInclude">
+       <xi:fallback>
+         <xi:include href="solr/conf/solrconfig_slave.xml">
+       </xi:fallback>
+     </xi:include>
+ }}}
+ 
+ solrconfig_master.xml might contain:
+ {{{
+ <requestHandler name="/replication" class="solr.ReplicationHandler" >
+     <lst name="master">
+       <str name="replicateAfter">commit</str>
+       <str name="replicateAfter">startup</str>
+       <str name="confFiles">schema.xml,stopwords.txt</str>
+     </lst>
+ </requestHandler>
+ }}}
+ 
+ 
+ Notes:
+  * Since the xinclude elements are handled by the XML parser and not by solr, properties
expansion is not available.
+  * File paths in href attributes can be absolute or relative to the server's (jetty, tomcat,
etc) current working directory.  They can also be HTTP URLs to fetch resources from remote
servers if desired.
+  * The Xerces parser, used by default in solr, doesn't support the xpointer="xpointer()"
scheme.  http://xerces.apache.org/xerces2-j/faq-xinclude.html
+ 

Mime
View raw message