lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ctarg...@apache.org
Subject [12/37] lucene-solr:branch_6x: squash merge jira/solr-10290 into master
Date Fri, 12 May 2017 14:05:20 GMT
http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/response-writers.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/response-writers.adoc b/solr/solr-ref-guide/src/response-writers.adoc
new file mode 100644
index 0000000..3fe3bf4
--- /dev/null
+++ b/solr/solr-ref-guide/src/response-writers.adoc
@@ -0,0 +1,324 @@
+= Response Writers
+:page-shortname: response-writers
+:page-permalink: response-writers.html
+:page-children: velocity-response-writer
+
+A Response Writer generates the formatted response of a search. Solr supports a variety of Response Writers to ensure that query responses can be parsed by the appropriate language or application.
+
+The `wt` parameter selects the Response Writer to be used. The table below lists the most common settings for the `wt` parameter.
+
+[width="100%",options="header",]
+|===
+|`wt` Parameter Setting |Response Writer Selected
+|csv |<<ResponseWriters-CSVResponseWriter,CSVResponseWriter>>
+|geojson |<<ResponseWriters-GeoJSONResponseWriter,GeoJSONResponseWriter>>
+|javabin |<<ResponseWriters-BinaryResponseWriter,BinaryResponseWriter>>
+|json |<<ResponseWriters-JSONResponseWriter,JSONResponseWriter>>
+|php |<<ResponseWriters-PHPResponseWriterandPHPSerializedResponseWriter,PHPResponseWriter>>
+|phps |<<ResponseWriters-PHPResponseWriterandPHPSerializedResponseWriter,PHPSerializedResponseWriter>>
+|python |<<ResponseWriters-PythonResponseWriter,PythonResponseWriter>>
+|ruby |<<ResponseWriters-RubyResponseWriter,RubyResponseWriter>>
+|smile |<<ResponseWriters-SmileResponseWriter,SmileResponseWriter>>
+|velocity |<<ResponseWriters-VelocityResponseWriter,VelocityResponseWriter>>
+|xlsx |<<ResponseWriters-XLSXResponseWriter,XLSXResponseWriter>>
+|xml |<<ResponseWriters-TheStandardXMLResponseWriter,XMLResponseWriter>>
+|xslt |<<ResponseWriters-TheXSLTResponseWriter,XSLTResponseWriter>>
+|===
+
+[[ResponseWriters-TheStandardXMLResponseWriter]]
+== The Standard XML Response Writer
+
+The XML Response Writer is the most general purpose and reusable Response Writer currently included with Solr. It is the format used in most discussions and documentation about the response of Solr queries.
+
+Note that the XSLT Response Writer can be used to convert the XML produced by this writer to other vocabularies or text-based formats.
+
+The behavior of the XML Response Writer can be driven by the following query parameters.
+
+[[ResponseWriters-TheversionParameter]]
+=== The `version` Parameter
+
+The `version` parameter determines the XML protocol used in the response. Clients are strongly encouraged to _always_ specify the protocol version, so as to ensure that the format of the response they receive does not change unexpectedly if the Solr server is upgraded and a new default format is introduced.
+
+Currently supported version values are:
+
+[width="100%",options="header",]
+|===
+|XML Version |Notes
+|2.2 |The format of the responseHeader changed to use the same `<lst>` structure as the rest of the response.
+|===
+
+The default value is the latest supported.
+
+[[ResponseWriters-ThestylesheetParameter]]
+=== The `stylesheet` Parameter
+
+The `stylesheet` parameter can be used to direct Solr to include a `<?xml-stylesheet type="text/xsl" href="..."?>` declaration in the XML response it returns.
+
+The default behavior is not to return any stylesheet declaration at all.
+
+[IMPORTANT]
+====
+Use of the `stylesheet` parameter is discouraged, as there is currently no way to specify external stylesheets, and no stylesheets are provided in the Solr distributions. This is a legacy parameter, which may be developed further in a future release.
+====
+
+[[ResponseWriters-TheindentParameter]]
+=== The `indent` Parameter
+
+If the `indent` parameter is used, and has a non-blank value, then Solr will make some attempts at indenting its XML response to make it more readable by humans.
+
+The default behavior is not to indent.
+
+[[ResponseWriters-TheXSLTResponseWriter]]
+== The XSLT Response Writer
+
+The XSLT Response Writer applies an XML stylesheet to output. It can be used for tasks such as formatting results for an RSS feed.
+
+[[ResponseWriters-trParameter]]
+=== `tr` Parameter
+
+The XSLT Response Writer accepts one parameter: the `tr` parameter, which identifies the XML transformation to use. The transformation must be found in the Solr `conf/xslt` directory.
+
+The Content-Type of the response is set according to the `<xsl:output>` statement in the XSLT transform, for example: `<xsl:output media-type="text/html"/>`
+
+[[ResponseWriters-Configuration]]
+=== Configuration
+
+The example below, from the `sample_techproducts_configs` <<response-writers.adoc#response-writers,config set>> in the Solr distribution, shows how the XSLT Response Writer is configured.
+
+[source,xml]
+----
+<!--
+  Changes to XSLT transforms are taken into account
+  every xsltCacheLifetimeSeconds at most.
+-->
+<queryResponseWriter name="xslt"
+                     class="org.apache.solr.request.XSLTResponseWriter">
+  <int name="xsltCacheLifetimeSeconds">5</int>
+</queryResponseWriter>
+----
+
+A value of 5 for `xsltCacheLifetimeSeconds` is good for development, to see XSLT changes quickly. For production you probably want a much higher value.
+
+[[ResponseWriters-JSONResponseWriter]]
+== JSON Response Writer
+
+A very commonly used Response Writer is the `JsonResponseWriter`, which formats output in JavaScript Object Notation (JSON), a lightweight data interchange format specified in specified in RFC 4627. Setting the `wt` parameter to `json` invokes this Response Writer.
+
+Here is a sample response for a simple query like `q=id:VS1GB400C3&wt=json`:
+
+[source,json]
+----
+{
+  "responseHeader":{
+    "zkConnected":true,
+    "status":0,
+    "QTime":7,
+    "params":{
+      "q":"id:VS1GB400C3",
+      "indent":"on",
+      "wt":"json"}},
+  "response":{"numFound":1,"start":0,"maxScore":2.3025851,"docs":[
+      {
+        "id":"VS1GB400C3",
+        "name":["CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"],
+        "manu":["Corsair Microsystems Inc."],
+        "manu_id_s":"corsair",
+        "cat":["electronics",
+          "memory"],
+        "price":[74.99],
+        "popularity":[7],
+        "inStock":[true],
+        "store":["37.7752,-100.0232"],
+        "manufacturedate_dt":"2006-02-13T15:26:37Z",
+        "payloads":["electronics|4.0 memory|2.0"],
+        "_version_":1549728120626479104}]
+  }}
+----
+
+The default mime type for the JSON writer is `application/json`, however this can be overridden in the `solrconfig.xml` - such as in this example from the "```techproducts```" configuration:
+
+[source,xml]
+----
+<queryResponseWriter name="json" class="solr.JSONResponseWriter">
+  <!-- For the purposes of the tutorial, JSON response are written as
+       plain text so that it's easy to read in *any* browser.
+       If you are building applications that consume JSON, just remove
+       this override to get the default "application/json" mime type.
+    -->
+  <str name="content-type">text/plain</str>
+</queryResponseWriter>
+----
+
+[[ResponseWriters-JSON-SpecificParameters]]
+=== JSON-Specific Parameters
+
+[[ResponseWriters-json.nl]]
+==== json.nl
+
+This parameter controls the output format of NamedLists, where order is more important than access by name. NamedList is currently used for field faceting data.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,40,40",options="header"]
+|===
+|json.nl Parameter setting |Example output for `NamedList("a"=1, "bar"="foo", null=3, null=null)` |Description
+|flat _(the default)_ |`["a",1, "bar","foo", null,3, null,null]` |NamedList is represented as a flat array, alternating names and values.
+|map |`{"a":1, "bar":"foo", "":3, "":null}` |NamedList is represented as a JSON object. Although this is the simplest mapping, a NamedList can have optional keys, repeated keys, and preserves order. Using a JSON object (essentially a map or hash) for a NamedList results in the loss of some information.
+|arrarr |`[["a",1], ["bar","foo"], [null,3], [null,null]]` |NamedList is represented as an array of two element arrays.
+|arrmap |[`{"a":1}, {"b":2}, 3, null]` |NamedList is represented as an array of JSON objects.
+|arrntv |`[{"name":"a","type":"int","value":1}, {"name":"bar","type":"str","value":"foo"}, {"name":null,"type":"int","value":3}, {"name":null,"type":"null","value":null}]` |NamedList is represented as an array of Name Type Value JSON objects.
+|===
+
+[[ResponseWriters-json.wrf]]
+==== json.wrf
+
+`json.wrf=function` adds a wrapper-function around the JSON response, useful in AJAX with dynamic script tags for specifying a JavaScript callback function.
+
+* http://www.xml.com/pub/a/2005/12/21/json-dynamic-script-tag.html
+* http://www.theurer.cc/blog/2005/12/15/web-services-json-dump-your-proxy/
+
+[[ResponseWriters-BinaryResponseWriter]]
+== Binary Response Writer
+
+This is a custom binary format used by Solr for inter-node communication as well as client-server communication. SolrJ uses this as the default for indexing as well as querying. See <<client-apis.adoc#client-apis,Client APIs>> for more details.
+
+[[ResponseWriters-GeoJSONResponseWriter]]
+== GeoJSON Response Writer
+
+Returns Solr results in http://geojson.org[GeoJSON] augmented with Solr-specific JSON. To use this, set `wt=geojson` and `geojson.field` to the name of a spatial Solr field. Not all spatial fields types are supported, and you'll get an error if you use an unsupported one.
+
+[[ResponseWriters-PythonResponseWriter]]
+== Python Response Writer
+
+Solr has an optional Python response format that extends its JSON output in the following ways to allow the response to be safely evaluated by the python interpreter:
+
+* true and false changed to True and False
+* Python unicode strings are used where needed
+* ASCII output (with unicode escapes) is used for less error-prone interoperability
+* newlines are escaped
+* null changed to None
+
+[[ResponseWriters-PHPResponseWriterandPHPSerializedResponseWriter]]
+== PHP Response Writer and PHP Serialized Response Writer
+
+Solr has a PHP response format that outputs an array (as PHP code) which can be evaluated. Setting the `wt` parameter to `php` invokes the PHP Response Writer.
+
+Example usage:
+
+[source,php]
+----
+$code = file_get_contents('http://localhost:8983/solr/techproducts/select?q=iPod&wt=php');
+eval("$result = " . $code . ";");
+print_r($result);
+----
+
+Solr also includes a PHP Serialized Response Writer that formats output in a serialized array. Setting the `wt` parameter to `phps` invokes the PHP Serialized Response Writer.
+
+Example usage:
+
+[source,php]
+----
+$serializedResult = file_get_contents('http://localhost:8983/solr/techproducts/select?q=iPod&wt=phps');
+$result = unserialize($serializedResult);
+print_r($result);
+----
+
+[[ResponseWriters-RubyResponseWriter]]
+== Ruby Response Writer
+
+Solr has an optional Ruby response format that extends its JSON output in the following ways to allow the response to be safely evaluated by Ruby's interpreter:
+
+* Ruby's single quoted strings are used to prevent possible string exploits.
+* \ and ' are the only two characters escaped.
+* Unicode escapes are not used. Data is written as raw UTF-8.
+* nil used for null.
+* => is used as the key/value separator in maps.
+
+Here is a simple example of how one may query Solr using the Ruby response format:
+
+[source,ruby]
+----
+require 'net/http'
+h = Net::HTTP.new('localhost', 8983)
+hresp, data = h.get('/solr/techproducts/select?q=iPod&wt=ruby', nil)
+rsp = eval(data)
+puts 'number of matches = ' + rsp['response']['numFound'].to_s
+#print out the name field for each returned document
+rsp['response']['docs'].each { |doc| puts 'name field = ' + doc['name'\] }
+----
+
+[[ResponseWriters-CSVResponseWriter]]
+== CSV Response Writer
+
+The CSV response writer returns a list of documents in comma-separated values (CSV) format. Other information that would normally be included in a response, such as facet information, is excluded.
+
+The CSV response writer supports multi-valued fields, as well as<<transforming-result-documents.adoc#transforming-result-documents,pseudo-fields>>, and the output of this CSV format is compatible with Solr's https://wiki.apache.org/solr/UpdateCSV[CSV update format].
+
+[[ResponseWriters-CSVParameters]]
+=== CSV Parameters
+
+These parameters specify the CSV format that will be returned. You can accept the default values or specify your own.
+
+[width="50%",options="header",]
+|===
+|Parameter |Default Value
+|csv.encapsulator |"
+|csv.escape |None
+|csv.separator |,
+|csv.header |Defaults to true. If false, Solr does not print the column headers
+|csv.newline |\n
+|csv.null |Defaults to a zero length string. Use this parameter when a document has no value for a particular field.
+|===
+
+[[ResponseWriters-Multi-ValuedFieldCSVParameters]]
+=== Multi-Valued Field CSV Parameters
+
+These parameters specify how multi-valued fields are encoded. Per-field overrides for these values can be done using `f.<fieldname>.csv.separator=|`.
+
+[width="50%",options="header",]
+|===
+|Parameter |Default Value
+|csv.mv.encapsulator |None
+|csv.mv.escape |\
+|csv.mv.separator |Defaults to the `csv.separator` value
+|===
+
+[[ResponseWriters-Example]]
+=== Example
+
+`\http://localhost:8983/solr/techproducts/select?q=ipod&fl=id,cat,name,popularity,price,score&wt=csv` returns:
+
+[source,csv]
+----
+id,cat,name,popularity,price,score
+IW-02,"electronics,connector",iPod & iPod Mini USB 2.0 Cable,1,11.5,0.98867977
+F8V7067-APL-KIT,"electronics,connector",Belkin Mobile Power Cord for iPod w/ Dock,1,19.95,0.6523595
+MA147LL/A,"electronics,music",Apple 60 GB iPod with Video Playback Black,10,399.0,0.2446348
+----
+
+[[ResponseWriters-VelocityResponseWriter]]
+== Velocity Response Writer
+
+The `VelocityResponseWriter` processes the Solr response and request context through Apache Velocity templating.
+
+See <<velocity-response-writer.adoc#velocity-response-writer,Velocity Response Writer>> section for details.
+
+[[ResponseWriters-SmileResponseWriter]]
+== Smile Response Writer
+
+The Smile format is a JSON-compatible binary format, described in detail here: http://wiki.fasterxml.com/SmileFormat.
+
+[[ResponseWriters-XLSXResponseWriter]]
+== XLSX Response Writer
+
+Use this to get the response as a spreadsheet in the .xlsx (Microsoft Excel) format. It accepts parameters in the form `colwidth.<field-name>` and `colname.<field-name>` which helps you customize the column widths and column names.
+
+This response writer has been added as part of the extraction library, and will only work if the extraction contrib is present in the server classpath. Defining the classpath with the `lib` directive is not sufficient. Instead, you will need to copy the necessary .jars to the Solr webapp's `lib` directory manually. You can run these commands from your `$SOLR_INSTALL` directory:
+
+[source,bash]
+----
+cp contrib/extraction/lib/*.jar server/solr-webapp/webapp/WEB-INF/lib/
+cp dist/solr-cell-6.3.0.jar server/solr-webapp/webapp/WEB-INF/lib/
+----
+
+Once the libraries are in place, you can add `wt=xlsx` to your request, and results will be returned as an XLSX sheet.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/result-clustering.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/result-clustering.adoc b/solr/solr-ref-guide/src/result-clustering.adoc
new file mode 100644
index 0000000..0321445
--- /dev/null
+++ b/solr/solr-ref-guide/src/result-clustering.adoc
@@ -0,0 +1,346 @@
+= Result Clustering
+:page-shortname: result-clustering
+:page-permalink: result-clustering.html
+
+The *clustering* (or *cluster analysis*) plugin attempts to automatically discover groups of related search hits (documents) and assign human-readable labels to these groups.
+
+By default in Solr, the clustering algorithm is applied to the search result of each single query -— this is called an _on-line_ clustering. While Solr contains an extension for full-index clustering (_off-line_ clustering) this section will focus on discussing on-line clustering only.
+
+Clusters discovered for a given query can be perceived as _dynamic facets_. This is beneficial when regular faceting is difficult (field values are not known in advance) or when the queries are exploratory in nature. Take a look at the http://search.carrot2.org/stable/search?query=solr&results=100&source=web&view=foamtree[Carrot2] project's demo page to see an example of search results clustering in action (the groups in the visualization have been discovered automatically in search results to the right, there is no external information involved).
+
+image::images/result-clustering/carrot2.png[image,width=900]
+
+The query issued to the system was _Solr_. It seems clear that faceting could not yield a similar set of groups, although the goals of both techniques are similar—to let the user explore the set of search results and either rephrase the query or narrow the focus to a subset of current documents. Clustering is also similar to <<result-grouping.adoc#result-grouping,Result Grouping>> in that it can help to look deeper into search results, beyond the top few hits.
+
+[[ResultClustering-PreliminaryConcepts]]
+== Preliminary Concepts
+
+Each *document* passed to the clustering component is composed of several logical parts:
+
+* a unique identifier,
+* origin URL,
+* the title,
+* the main content,
+* a language code of the title and content.
+
+The identifier part is mandatory, everything else is optional but at least one of the text fields (title or content) will be required to make the clustering process reasonable. It is important to remember that logical document parts must be mapped to a particular schema and its fields. The content (text) for clustering can be sourced from either a stored text field or context-filtered using a highlighter, all these options are explained below in the <<ResultClustering-Configuration,configuration>> section.
+
+A *clustering algorithm* is the actual logic (implementation) that discovers relationships among the documents in the search result and forms human-readable cluster labels. Depending on the choice of the algorithm the clusters may (and probably will) vary. Solr comes with several algorithms implemented in the open source http://carrot2.org[Carrot2] project, commercial alternatives also exist.
+
+[[ResultClustering-QuickStartExample]]
+== Quick Start Example
+
+The "```techproducts```" example included with Solr is pre-configured with all the necessary components for result clustering -- but they are disabled by default.
+
+To enable the clustering component contrib and a dedicated search handler configured to use it, specify a JVM System Property when running the example:
+
+[source,bash]
+----
+bin/solr start -e techproducts -Dsolr.clustering.enabled=true
+----
+
+You can now try out the clustering handler by opening the following URL in a browser:
+
+`\http://localhost:8983/solr/techproducts/clustering?q=*:*&rows=100`
+
+The output XML should include search hits and an array of automatically discovered clusters at the end, resembling the output shown here:
+
+[source,xml]
+----
+<response>
+  <lst name="responseHeader">
+    <int name="status">0</int>
+    <int name="QTime">299</int>
+  </lst>
+  <result name="response" numFound="32" start="0" maxScore="1.0">
+    <doc>
+      <str name="id">GB18030TEST</str>
+      <str name="name">Test with some GB18030 encoded characters</str>
+      <arr name="features">
+        <str>No accents here</str>
+        <str>这是一个功能</str>
+        <str>This is a feature (translated)</str>
+        <str>这份文件是很有光泽</str>
+        <str>This document is very shiny (translated)</str>
+      </arr>
+      <float name="price">0.0</float>
+      <str name="price_c">0,USD</str>
+      <bool name="inStock">true</bool>
+      <long name="_version_">1448955395025403904</long>
+      <float name="score">1.0</float>
+    </doc>
+
+    <!-- more search hits, omitted -->
+  </result>
+
+  <arr name="clusters">
+    <lst>
+      <arr name="labels">
+        <str>DDR</str>
+      </arr>
+      <double name="score">3.9599865057283354</double>
+      <arr name="docs">
+        <str>TWINX2048-3200PRO</str>
+        <str>VS1GB400C3</str>
+        <str>VDBDB1A16</str>
+      </arr>
+    </lst>
+    <lst>
+      <arr name="labels">
+        <str>iPod</str>
+      </arr>
+      <double name="score">11.959228467119022</double>
+      <arr name="docs">
+        <str>F8V7067-APL-KIT</str>
+        <str>IW-02</str>
+        <str>MA147LL/A</str>
+      </arr>
+    </lst>
+
+    <!-- More clusters here, omitted. -->
+
+    <lst>
+      <arr name="labels">
+        <str>Other Topics</str>
+      </arr>
+      <double name="score">0.0</double>
+      <bool name="other-topics">true</bool>
+      <arr name="docs">
+        <str>adata</str>
+        <str>apple</str>
+        <str>asus</str>
+        <str>ati</str>
+        <!-- other unassigned document IDs here -->
+      </arr>
+    </lst>
+  </arr>
+</response>
+----
+
+There were a few clusters discovered for this query (`\*:*`), separating search hits into various categories: DDR, iPod, Hard Drive, etc. Each cluster has a label and score that indicates the "goodness" of the cluster. The score is algorithm-specific and is meaningful only in relation to the scores of other clusters in the same set. In other words, if cluster _A_ has a higher score than cluster _B_, cluster _A_ should be of better quality (have a better label and/or more coherent document set). Each cluster has an array of identifiers of documents belonging to it. These identifiers correspond to the `uniqueKey` field declared in the schema.
+
+Depending on the quality of input documents, some clusters may not make much sense. Some documents may be left out and not be clustered at all; these will be assigned to the synthetic _Other Topics_ group, marked with the `other-topics` property set to `true` (see the XML dump above for an example). The score of the other topics group is zero.
+
+[[ResultClustering-Installation]]
+== Installation
+
+The clustering contrib extension requires `dist/solr-clustering-*.jar` and all JARs under `contrib/clustering/lib`.
+
+[[ResultClustering-Configuration]]
+== Configuration
+
+[[ResultClustering-DeclarationoftheSearchComponentandRequestHandler]]
+=== Declaration of the Search Component and Request Handler
+
+Clustering extension is a search component and must be declared in `solrconfig.xml`. Such a component can be then appended to a request handler as the last component in the chain (because it requires search results which must be previously fetched by the search component).
+
+An example configuration could look as shown below.
+
+. Include the required contrib JARs. Note that by default paths are relative to the Solr core so they may need adjustments to your configuration, or an explicit specification of the `$solr.install.dir`.
++
+[source,xml]
+----
+<lib dir="${solr.install.dir:../../..}/contrib/clustering/lib/" regex=".*\.jar" />
+<lib dir="${solr.install.dir:../../..}/dist/" regex="solr-clustering-\d.*\.jar" />
+----
+. Declaration of the search component. Each component can also declare multiple clustering pipelines ("engines"), which can be selected at runtime by passing `clustering.engine=(engine name)` URL parameter.
++
+[source,xml]
+----
+<searchComponent name="clustering" class="solr.clustering.ClusteringComponent">
+  <!-- Lingo clustering algorithm -->
+  <lst name="engine">
+    <str name="name">lingo</str>
+    <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
+  </lst>
+
+  <!-- An example definition for the STC clustering algorithm. -->
+  <lst name="engine">
+    <str name="name">stc</str>
+    <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
+  </lst>
+</searchComponent>
+----
+. A request handler to which we append the clustering component declared above.
++
+[source,xml]
+----
+<requestHandler name="/clustering"
+                class="solr.SearchHandler">
+  <lst name="defaults">
+    <bool name="clustering">true</bool>
+    <bool name="clustering.results">true</bool>
+
+    <!-- Logical field to physical field mapping. -->
+    <str name="carrot.url">id</str>
+    <str name="carrot.title">doctitle</str>
+    <str name="carrot.snippet">content</str>
+
+    <!-- Configure any other request handler parameters. We will cluster the
+         top 100 search results so bump up the 'rows' parameter. -->
+    <str name="rows">100</str>
+    <str name="fl">*,score</str>
+  </lst>
+
+  <!-- Append clustering at the end of the list of search components. -->
+  <arr name="last-components">
+    <str>clustering</str>
+  </arr>
+</requestHandler>
+----
+
+
+[[ResultClustering-ConfigurationParametersoftheClusteringComponent]]
+=== Configuration Parameters of the Clustering Component
+
+The table below summarizes parameters of each clustering engine or the entire clustering component (depending where they are declared).
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Parameter |Description
+|`clustering` |When `true`, clustering component is enabled.
+|`clustering.engine` |Declares which clustering engine to use. If not present, the first declared engine will become the default one.
+|`clustering.results` |When `true`, the component will perform clustering of search results (this should be enabled).
+|`clustering.collection` |When `true`, the component will perform clustering of the whole document index (this section does not cover full-index clustering).
+|===
+
+At the engine declaration level, the following parameters are supported.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Parameter |Description
+|`carrot.algorithm` |The algorithm class.
+|`carrot.resourcesDir` |Algorithm-specific resources and configuration files (stop words, other lexical resources, default settings). By default points to `conf/clustering/carrot2/`
+|`carrot.outputSubClusters` |If `true` and the algorithm supports hierarchical clustering, sub-clusters will also be emitted. Default value: true.
+|`carrot.numDescriptions` |Maximum number of per-cluster labels to return (if the algorithm assigns more than one label to a cluster).
+|===
+
+The `carrot.algorithm` parameter should contain a fully qualified class name of an algorithm supported by the http://project.carrot2.org[Carrot2] framework. Currently, the following algorithms are available:
+
+* `org.carrot2.clustering.lingo.LingoClusteringAlgorithm` (open source)
+* `org.carrot2.clustering.stc.STCClusteringAlgorithm` (open source)
+* `org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm` (open source)
+* `com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm` (commercial)
+
+For a comparison of characteristics of these algorithms see the following links:
+
+* http://doc.carrot2.org/#section.advanced-topics.fine-tuning.choosing-algorithm
+* http://project.carrot2.org/algorithms.html
+* http://carrotsearch.com/lingo3g-comparison.html
+
+The question of which algorithm to choose depends on the amount of traffic (STC is faster than Lingo, but arguably produces less intuitive clusters, Lingo3G is the fastest algorithm but is not free or open source), expected result (Lingo3G provides hierarchical clusters, Lingo and STC provide flat clusters), and the input data (each algorithm will cluster the input slightly differently). There is no one answer which algorithm is "the best".
+
+[[ResultClustering-ContextualandFullFieldClustering]]
+=== Contextual and Full Field Clustering
+
+The clustering engine can apply clustering to the full content of (stored) fields or it can run an internal highlighter pass to extract context-snippets before clustering. Highlighting is recommended when the logical snippet field contains a lot of content (this would affect clustering performance). Highlighting can also increase the quality of clustering because the content passed to the algorithm will be more focused around the query (it will be query-specific context). The following parameters control the internal highlighter.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Parameter |Description
+|`carrot.produceSummary` |When `true` the clustering component will run a highlighter pass on the content of logical fields pointed to by `carrot.title` and `carrot.snippet`. Otherwise full content of those fields will be clustered.
+|`carrot.fragSize` |The size, in characters, of the snippets (aka fragments) created by the highlighter. If not specified, the default highlighting fragsize (`hl.fragsize`) will be used.
+|`carrot.summarySnippets` |The number of summary snippets to generate for clustering. If not specified, the default highlighting snippet count (`hl.snippets`) will be used.
+|===
+
+[[ResultClustering-LogicaltoDocumentFieldMapping]]
+=== Logical to Document Field Mapping
+
+As already mentioned in <<ResultClustering-PreliminaryConcepts,Preliminary Concepts>>, the clustering component clusters "documents" consisting of logical parts that need to be mapped onto physical schema of data stored in Solr. The field mapping attributes provide a connection between fields and logical document parts. Note that the content of title and snippet fields must be *stored* so that it can be retrieved at search time.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Parameter |Description
+|`carrot.title` |The field (alternatively comma- or space-separated list of fields) that should be mapped to the logical document's title. The clustering algorithms typically give more weight to the content of the title field compared to the content (snippet). For best results, the field should contain concise, noise-free content. If there is no clear title in your data, you can leave this parameter blank.
+|`carrot.snippet` |The field (alternatively comma- or space-separated list of fields) that should be mapped to the logical document's main content. If this mapping points to very large content fields the performance of clustering may drop significantly. An alternative then is to use query-context snippets for clustering instead of full field content. See the description of the `carrot.produceSummary` parameter for details.
+|`carrot.url` |The field that should be mapped to the logical document's content URL. Leave blank if not required.
+|===
+
+[[ResultClustering-ClusteringMultilingualContent]]
+=== Clustering Multilingual Content
+
+The field mapping specification can include a `carrot.lang` parameter, which defines the field that stores http://www.loc.gov/standards/iso639-2/php/code_list.php[ISO 639-1] code of the language in which the title and content of the document are written. This information can be stored in the index based on apriori knowledge of the documents' source or a language detection filter applied at indexing time. All algorithms inside the Carrot2 framework will accept ISO codes of languages defined in https://github.com/carrot2/carrot2/blob/master/core/carrot2-core/src/org/carrot2/core/LanguageCode.java[LanguageCode enum].
+
+The language hint makes it easier for clustering algorithms to separate documents from different languages on input and to pick the right language resources for clustering. If you do have multi-lingual query results (or query results in a language different than English), it is strongly advised to map the language field appropriately.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Parameter |Description
+|`carrot.lang` |The field that stores ISO 639-1 code of the language of the document's text fields.
+|`carrot.lcmap` |A mapping of arbitrary strings into ISO 639 two-letter codes used by `carrot.lang`. The syntax of this parameter is the same as `langid.map.lcmap`, for example: `langid.map.lcmap=japanese:ja polish:pl english:en`
+|===
+
+The default language can also be set using Carrot2-specific algorithm attributes (in this case the http://doc.carrot2.org/#section.attribute.lingo.MultilingualClustering.defaultLanguage[MultilingualClustering.defaultLanguage] attribute).
+
+[[ResultClustering-TweakingAlgorithmSettings]]
+== Tweaking Algorithm Settings
+
+The algorithms that come with Solr are using their default settings which may be inadequate for all data sets. All algorithms have lexical resources and resources (stop words, stemmers, parameters) that may require tweaking to get better clusters (and cluster labels). For Carrot2-based algorithms it is probably best to refer to a dedicated tuning application called Carrot2 Workbench (screenshot below). From this application one can export a set of algorithm attributes as an XML file, which can be then placed under the location pointed to by `carrot.resourcesDir`.
+
+image::images/result-clustering/carrot2-workbench.png[image,scaledwidth=75.0%]
+
+[[ResultClustering-ProvidingDefaults]]
+=== Providing Defaults
+
+The default attributes for all engines (algorithms) declared in the clustering component are placed under `carrot.resourcesDir` and with an expected file name of `engineName-attributes.xml`. So for an engine named `lingo` and the default value of `carrot.resourcesDir`, the attributes would be read from a file in `conf/clustering/carrot2/lingo-attributes.xml`.
+
+An example XML file changing the default language of documents to Polish is shown below.
+
+[source,xml]
+----
+<attribute-sets default="attributes">
+  <attribute-set id="attributes">
+    <value-set>
+      <label>attributes</label>
+      <attribute key="MultilingualClustering.defaultLanguage">
+        <value type="org.carrot2.core.LanguageCode" value="POLISH"/>
+      </attribute>
+    </value-set>
+  </attribute-set>
+</attribute-sets>
+----
+
+[[ResultClustering-TweakingatQuery-Time]]
+=== Tweaking at Query-Time
+
+The clustering component and Carrot2 clustering algorithms can accept query-time attribute overrides. Note that certain things (for example lexical resources) can only be initialized once (at startup, via the XML configuration files).
+
+An example query that changes the `LingoClusteringAlgorithm.desiredClusterCountBase` parameter for the Lingo algorithm: http://localhost:8983/solr/techproducts/clustering?q=*:*&rows=100&LingoClusteringAlgorithm.desiredClusterCountBase=20.
+
+The clustering engine (the algorithm declared in `solrconfig.xml`) can also be changed at runtime by passing `clustering.engine=name` request attribute: http://localhost:8983/solr/techproducts/clustering?q=*:*&rows=100&clustering.engine=kmeans
+
+[[ResultClustering-PerformanceConsiderations]]
+== Performance Considerations
+
+Dynamic clustering of search results comes with two major performance penalties:
+
+* Increased cost of fetching a larger-than-usual number of search results (50, 100 or more documents),
+* Additional computational cost of the clustering itself.
+
+For simple queries, the clustering time will usually dominate the fetch time. If the document content is very long the retrieval of stored content can become a bottleneck. The performance impact of clustering can be lowered in several ways:
+
+* feed less content to the clustering algorithm by enabling `carrot.produceSummary` attribute,
+* perform clustering on selected fields (titles only) to make the input smaller,
+* use a faster algorithm (STC instead of Lingo, Lingo3G instead of STC),
+* tune the performance attributes related directly to a specific algorithm.
+
+Some of these techniques are described in _Apache SOLR and Carrot2 integration strategies_ document, available at http://carrot2.github.io/solr-integration-strategies. The topic of improving performance is also included in the Carrot2 manual at http://doc.carrot2.org/#section.advanced-topics.fine-tuning.performance.
+
+[[ResultClustering-AdditionalResources]]
+== Additional Resources
+
+The following resources provide additional information about the clustering component in Solr and its potential applications.
+
+* Apache Solr and Carrot2 integration strategies: http://carrot2.github.io/solr-integration-strategies
+* Apache Solr Wiki (covers previous Solr versions, may be inaccurate): http://carrot2.github.io/solr-integration-strategies
+* Clustering and Visualization of Solr search results (video from Berlin BuzzWords conference, 2011): http://vimeo.com/26616444

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/result-grouping.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/result-grouping.adoc b/solr/solr-ref-guide/src/result-grouping.adoc
new file mode 100644
index 0000000..d2dfafb
--- /dev/null
+++ b/solr/solr-ref-guide/src/result-grouping.adoc
@@ -0,0 +1,239 @@
+= Result Grouping
+:page-shortname: result-grouping
+:page-permalink: result-grouping.html
+
+Result Grouping groups documents with a common field value into groups and returns the top documents for each group.
+
+For example, if you searched for "DVD" on an electronic retailer's e-commerce site, you might be returned three categories such as "TV and Video", "Movies", and "Computers" with three results per category. In this case, the query term "DVD" appeared in all three categories, so Solr groups them together in order to increase relevancy for the user.
+
+.Prefer Collapse & Expand instead
+[NOTE]
+====
+Solr's <<collapse-and-expand-results.adoc#collapse-and-expand-results,Collapse and Expand>> feature is newer and mostly overlaps with Result Grouping. There are features unique to both, and they have different performance characteristics. That said, in most cases Collapse and Expand is preferable to Result Grouping.
+====
+
+Result Grouping is separate from <<faceting.adoc#faceting,Faceting>>. Though it is conceptually similar, faceting returns all relevant results and allows the user to refine the results based on the facet category. For example, if you search for "shoes" on a footwear retailer's e-commerce site, Solr would return all results for that query term, along with selectable facets such as "size," "color," "brand," and so on.
+
+You can however combine grouping with faceting. Grouped faceting supports `facet.field` and `facet.range` but currently doesn't support date and pivot faceting. The facet counts are computed based on the first `group.field` parameter, and other `group.field` parameters are ignored.
+
+Grouped faceting differs from non grouped facets `(sum of all facets) == (total of products with that property)` as shown in the following example:
+
+Object 1
+
+* name: Phaser 4620a
+* ppm: 62
+* product_range: 6
+
+Object 2
+
+* name: Phaser 4620i
+* ppm: 65
+* product_range: 6
+
+Object 3
+
+* name: ML6512
+* ppm: 62
+* product_range: 7
+
+If you ask Solr to group these documents by "product_range", then the total amount of groups is 2, but the facets for ppm are 2 for 62 and 1 for 65.
+
+[[ResultGrouping-RequestParameters]]
+== Request Parameters
+
+Result Grouping takes the following request parameters. Any number of these request parameters can be included in a single request:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,20,60",options="header"]
+|===
+|Parameter |Type |Description
+|group |Boolean |If true, query results will be grouped.
+|group.field |string |The name of the field by which to group results. The field must be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as `ExternalFileField`. It must also be a string-based field, such as `StrField` or `TextField`
+|group.func |query a|
+Group based on the unique values of a function query.
+
+NOTE: This option does not work with <<ResultGrouping-DistributedResultGroupingCaveats,distributed searches>>.
+
+|group.query |query |Return a single group of documents that match the given query.
+|rows |integer |The number of groups to return. The default value is 10.
+|start |integer |Specifies an initial offset for the list of groups.
+|group.limit |integer |Specifies the number of results to return for each group. The default value is 1.
+|group.offset |integer |Specifies an initial offset for the document list of each group.
+|sort |sortspec |Specifies how Solr sorts the groups relative to each other. For example, `sort=popularity desc` will cause the groups to be sorted according to the highest popularity document in each group. The default value is `score desc`.
+|group.sort |sortspec |Specifies how Solr sorts documents within each group. The default behavior if `group.sort` is not specified is to use the same effective value as the `sort` parameter.
+|group.format |grouped/simple |If this parameter is set to `simple`, the grouped documents are presented in a single flat list, and the `start` and `rows` parameters affect the numbers of documents instead of groups.
+|group.main |Boolean |If true, the result of the first field grouping command is used as the main result list in the response, using `group.format=simple`.
+|group.ngroups |Boolean a|
+If true, Solr includes the number of groups that have matched the query in the results. The default value is false.
+
+See below for <<ResultGrouping-DistributedResultGroupingCaveats,Distributed Result Grouping Caveats>> when using sharded indexes
+
+|group.truncate |Boolean |If true, facet counts are based on the most relevant document of each group matching the query. The default value is false.
+|group.facet |Boolean a|
+Determines whether to compute grouped facets for the field facets specified in facet.field parameters. Grouped facets are computed based on the first specified group. As with normal field faceting, fields shouldn't be tokenized (otherwise counts are computed for each token). Grouped faceting supports single and multivalued fields. Default is false.
+
+*Warning*: There can be a heavy performance cost to this option.
+
+See below for <<ResultGrouping-DistributedResultGroupingCaveats,Distributed Result Grouping Caveats>> when using sharded indexes
+
+|group.cache.percent |integer between 0 and 100 |Setting this parameter to a number greater than 0 enables caching for result grouping. Result Grouping executes two searches; this option caches the second search. The default value is 0. Testing has shown that group caching only improves search time with Boolean, wildcard, and fuzzy queries. For simple queries like term or "match all" queries, group caching degrades performance.
+|===
+
+Any number of group commands (`group.field`, `group.func`, `group.query`) may be specified in a single request.
+
+[[ResultGrouping-Examples]]
+== Examples
+
+All of the following sample queries work with Solr's "`bin/solr -e techproducts`" example.
+
+[[ResultGrouping-GroupingResultsbyField]]
+=== Grouping Results by Field
+
+In this example, we will group results based on the `manu_exact` field, which specifies the manufacturer of the items in the sample dataset.
+
+`\http://localhost:8983/solr/techproducts/select?wt=json&indent=true&fl=id,name&q=solr+memory&group=true&group.field=manu_exact`
+
+[source,json]
+----
+{
+"..."
+"grouped":{
+  "manu_exact":{
+    "matches":6,
+    "groups":[{
+        "groupValue":"Apache Software Foundation",
+        "doclist":{"numFound":1,"start":0,"docs":[
+            {
+              "id":"SOLR1000",
+              "name":"Solr, the Enterprise Search Server"}]
+        }},
+      {
+        "groupValue":"Corsair Microsystems Inc.",
+        "doclist":{"numFound":2,"start":0,"docs":[
+            {
+              "id":"VS1GB400C3",
+              "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"}]
+        }},
+      {
+        "groupValue":"A-DATA Technology Inc.",
+        "doclist":{"numFound":1,"start":0,"docs":[
+            {
+              "id":"VDBDB1A16",
+              "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"}]
+        }},
+      {
+        "groupValue":"Canon Inc.",
+        "doclist":{"numFound":1,"start":0,"docs":[
+            {
+              "id":"0579B002",
+              "name":"Canon PIXMA MP500 All-In-One Photo Printer"}]
+        }},
+      {
+        "groupValue":"ASUS Computer Inc.",
+        "doclist":{"numFound":1,"start":0,"docs":[
+            {
+              "id":"EN7800GTX/2DHTV/256M",
+              "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
+        }
+      }]}}}
+----
+
+The response indicates that there are six total matches for our query. For each of the five unique values of `group.field`, Solr returns a `docList` for that `groupValue` such that the `numFound` indicates the total number of documents in that group, and the top documents are returned according to the implicit default `group.limit=1` and `group.sort=score desc` parameters. The resulting groups are then sorted by the score of the top document within each group based on the implicit `sort=score desc`, and the number of groups returned is limited to the implicit `rows=10`.
+
+We can run the same query with the request parameter `group.main=true`. This will format the results as a single flat document list. This flat format does not include as much information as the normal result grouping query results – notably the `numFound` in each group – but it may be easier for existing Solr clients to parse.
+
+`\http://localhost:8983/solr/techproducts/select?wt=json&indent=true&fl=id,name,manufacturer&q=solr+memory&group=true&group.field=manu_exact&group.main=true`
+
+[source,json]
+----
+{
+  "responseHeader":{
+    "status":0,
+    "QTime":1,
+    "params":{
+      "fl":"id,name,manufacturer",
+      "indent":"true",
+      "q":"solr memory",
+      "group.field":"manu_exact",
+      "group.main":"true",
+      "group":"true",
+      "wt":"json"}},
+  "grouped":{},
+  "response":{"numFound":6,"start":0,"docs":[
+      {
+        "id":"SOLR1000",
+        "name":"Solr, the Enterprise Search Server"},
+      {
+        "id":"VS1GB400C3",
+        "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"},
+      {
+        "id":"VDBDB1A16",
+        "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"},
+      {
+        "id":"0579B002",
+        "name":"Canon PIXMA MP500 All-In-One Photo Printer"},
+      {
+        "id":"EN7800GTX/2DHTV/256M",
+        "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
+  }
+}
+----
+
+[[ResultGrouping-GroupingbyQuery]]
+=== Grouping by Query
+
+In this example, we will use the `group.query` parameter to find the top three results for "memory" in two different price ranges: 0.00 to 99.99, and over 100.
+
+`\http://localhost:8983/solr/techproducts/select?wt=json&indent=true&fl=name,price&q=memory&group=true&group.query=price:[0+TO+99.99]&group.query=price:[100+TO+*]&group.limit=3`
+
+[source,json]
+----
+{
+  "responseHeader":{
+    "status":0,
+    "QTime":42,
+    "params":{
+      "fl":"name,price",
+      "indent":"true",
+      "q":"memory",
+      "group.limit":"3",
+      "group.query":["price:[0 TO 99.99]",
+      "price:[100 TO *]"],
+      "group":"true",
+      "wt":"json"}},
+  "grouped":{
+    "price:[0 TO 99.99]":{
+      "matches":5,
+      "doclist":{"numFound":1,"start":0,"docs":[
+          {
+            "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",
+            "price":74.99}]
+      }},
+    "price:[100 TO *]":{
+      "matches":5,
+      "doclist":{"numFound":3,"start":0,"docs":[
+          {
+            "name":"CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",
+            "price":185.0},
+          {
+            "name":"Canon PIXMA MP500 All-In-One Photo Printer",
+            "price":179.99},
+          {
+            "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)",
+            "price":479.95}]
+      }
+    }
+  }
+}
+----
+
+In this case, Solr found five matches for "memory," but only returns four results grouped by price. This is because one result for "memory" did not have a price assigned to it.
+
+[[ResultGrouping-DistributedResultGroupingCaveats]]
+== Distributed Result Grouping Caveats
+
+Grouping is supported for <<solrcloud.adoc#solrcloud,distributed searches>>, with some caveats:
+
+* Currently `group.func` is is not supported in any distributed searches
+* `group.ngroups` and `group.facet` require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. <<shards-and-indexing-data-in-solrcloud.adoc#shards-and-indexing-data-in-solrcloud,Document routing via composite keys>> can be a useful solution in many situations.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/rule-based-authorization-plugin.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/rule-based-authorization-plugin.adoc b/solr/solr-ref-guide/src/rule-based-authorization-plugin.adoc
new file mode 100644
index 0000000..cc85567
--- /dev/null
+++ b/solr/solr-ref-guide/src/rule-based-authorization-plugin.adoc
@@ -0,0 +1,225 @@
+= Rule-Based Authorization Plugin
+:page-shortname: rule-based-authorization-plugin
+:page-permalink: rule-based-authorization-plugin.html
+
+Solr allows configuring roles to control user access to the system.
+
+This is accomplished through rule-based permission definitions which are assigned to users. The roles are fully customizable, and provide the ability to limit access to specific collections, request handlers, request parameters, and request methods.
+
+The roles can be used with any of the authentication plugins or with a custom authentication plugin if you have created one. You will only need to ensure that you configure the role-to-user mappings with the proper user IDs that your authentication system provides.
+
+Once defined through the API, roles are stored in `security.json`.
+
+[[Rule-BasedAuthorizationPlugin-EnabletheAuthorizationPlugin]]
+== Enable the Authorization Plugin
+
+The plugin must be enabled in `security.json`. This file and where to put it in your system is described in detail in the section <<authentication-and-authorization-plugins.adoc#AuthenticationandAuthorizationPlugins-EnablePluginswithsecurity.json,Enable Plugins with security.json>>.
+
+This file has two parts, the `authentication` part and the `authorization` part. The `authentication` part stores information about the class being used for authentication.
+
+The `authorization` part is not related to Basic authentication, but is a separate authorization plugin designed to support fine-grained user access control. When creating `security.json` you can add the permissions to the file, or you can use the Authorization API described below to add them as needed.
+
+This example `security.json` shows how the <<basic-authentication-plugin.adoc#basic-authentication-plugin,Basic authentication plugin>> can work with this authorization plugin:
+
+[source,json]
+----
+{
+"authentication":{
+   "class":"solr.BasicAuthPlugin",
+   "blockUnknown": true,
+   "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
+},
+"authorization":{
+   "class":"solr.RuleBasedAuthorizationPlugin",
+   "permissions":[{"name":"security-edit",
+      "role":"admin"}],
+   "user-role":{"solr":"admin"}
+}}
+----
+
+There are several things defined in this example:
+
+* Basic authentication and rule-based authorization plugins are enabled.
+* A user called 'solr', with a password has been defined.
+* All requests w/o credentials will be rejected with a 401 error. Set `'blockUnknown'` to false (or remove it altogether) if you wish to let unauthenticated requests to go through. However, if a particular resource is protected by a rule, they are rejected anyway with a 401 error.
+* The 'admin' role has been defined, and it has permission to edit security settings.
+* The 'solr' user has been defined to the 'admin' role.
+
+[[Rule-BasedAuthorizationPlugin-PermissionAttributes]]
+== Permission Attributes
+
+Each role is comprised of one or more permissions which define what the user is allowed to do. Each permission is made up of several attributes that define the allowed activity. There are some pre-defined permissions which cannot be modified.
+
+The permissions are consulted in order they appear in `security.json`. The first permission that matches is applied for each user, so the strictest permissions should be at the top of the list. Permissions order can be controlled with a parameter of the Authorization API, as described below.
+
+[[Rule-BasedAuthorizationPlugin-PredefinedPermissions]]
+=== Predefined Permissions
+
+There are several permissions that are pre-defined. These have fixed default values, which cannot be modified, and new attributes cannot be added. To use these attributes, simply define a role that includes this permission, and then assign a user to that role.
+
+The pre-defined permissions are:
+
+* *security-edit:* this permission is allowed to edit the security configuration, meaning any update action that modifies `security.json` through the APIs will be allowed.
+* *security-read*: this permission is allowed to read the security configuration, meaning any action that reads `security.json` settings through the APIs will be allowed.
+* *schema-edit*: this permission is allowed to edit a collection's schema using the <<schema-api.adoc#schema-api,Schema API>>. Note that this allows schema edit permissions for _all_ collections. If edit permissions should only be applied to specific collections, a custom permission would need to be created.
+* *schema-read*: this permission is allowed to read a collection's schema using the <<schema-api.adoc#schema-api,Schema API>>. Note that this allows schema read permissions for _all_ collections. If read permissions should only be applied to specific collections, a custom permission would need to be created.
+* *config-edit*: this permission is allowed to edit a collection's configuration using the <<config-api.adoc#config-api,Config API>>, the <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>>, and other APIs which modify `configoverlay.json`. Note that this allows configuration edit permissions for _all_ collections. If edit permissions should only be applied to specific collections, a custom permission would need to be created.
+* *core-admin-read* : Read operations on the core admin API
+* *core-admin-edit*: Core admin commands that can mutate the system state.
+* *config-read*: this permission is allowed to read a collection's configuration using the <<config-api.adoc#config-api,Config API>>, the <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>>, and other APIs which modify `configoverlay.json`. Note that this allows configuration read permissions for _all_ collections. If read permissions should only be applied to specific collections, a custom permission would need to be created.
+* *collection-admin-edit*: this permission is allowed to edit a collection's configuration using the <<collections-api.adoc#collections-api,Collections API>>. Note that this allows configuration edit permissions for _all_ collections. If edit permissions should only be applied to specific collections, a custom permission would need to be created. Specifically, the following actions of the Collections API would be allowed:
+** CREATE
+** RELOAD
+** SPLITSHARD
+** CREATESHARD
+** DELETESHARD
+** CREATEALIAS
+** DELETEALIAS
+** DELETE
+** DELETEREPLICA
+** ADDREPLICA
+** CLUSTERPROP
+** MIGRATE
+** ADDROLE
+** REMOVEROLE
+** ADDREPLICAPROP
+** DELETEREPLICAPROP
+** BALANCESHARDUNIQUE
+** REBALANCELEADERS
+* *collection-admin-read*: this permission is allowed to read a collection's configuration using the <<collections-api.adoc#collections-api,Collections API>>. Note that this allows configuration read permissions for _all_ collections. If read permissions should only be applied to specific collections, a custom permission would need to be created. Specifically, the following actions of the Collections API would be allowed:
+** LIST
+** OVERSEERSTATUS
+** CLUSTERSTATUS
+** REQUESTSTATUS
+* *update*: this permission is allowed to perform any update action on any collection. This includes sending documents for indexing (using an <<requesthandlers-and-searchcomponents-in-solrconfig.adoc#RequestHandlersandSearchComponentsinSolrConfig-UpdateRequestHandlers,update request handler>>). This applies to all collections by default (`collection:"*"`).
+* *read*: this permission is allowed to perform any read action on any collection. This includes querying using search handlers (using <<requesthandlers-and-searchcomponents-in-solrconfig.adoc#RequestHandlersandSearchComponentsinSolrConfig-SearchHandlers,request handlers>>) such as `/select`, `/get`, `/browse`, `/tvrh`, `/terms`, `/clustering`, `/elevate`, `/export`, `/spell`, `/clustering`, and `/sql`. This applies to all collections by default ( `collection:"*"` ).
+* *all*: Any requests coming to Solr.
+
+[[Rule-BasedAuthorizationPlugin-AuthorizationAPI]]
+== Authorization API
+
+[[Rule-BasedAuthorizationPlugin-APIEndpoint]]
+=== API Endpoint
+
+`/admin/authorization`: takes a set of commands to create permissions, map permissions to roles, and map roles to users.
+
+[[Rule-BasedAuthorizationPlugin-ManagePermissions]]
+=== Manage Permissions
+
+Three commands control managing permissions:
+
+* `set-permission`: create a new permission, overwrite an existing permission definition, or assign a pre-defined permission to a role.
+* `update-permission`: update some attributes of an existing permission definition.
+* `delete-permission`: remove a permission definition.
+
+Permissions need to be created if they are not on the list of pre-defined permissions above.
+
+Several properties can be used to define your custom permission.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Property |Description
+|name |The name of the permission. This is required only if it is a predefined permission.
+|collection a|
+The collection or collections the permission will apply to.
+
+When the path that will be allowed is collection-specific, such as when setting permissions to allow useof the Schema API, omitting the collection property will allow the defined path and/or method for all collections. However, when the path is one that is non-collection-specific, such as the Collections API, the collection value must be `null`. The default value is * (all collections).
+
+|path |A request handler name, such as `/update` or `/select`. A wild card is supported, to allow for all paths as appropriate (such as, `/update/*`).
+|method |HTTP methods that are allowed for this permission. You could allow only GET requests, or have a role that allows PUT and POST requests. The method values that are allowed for this property are GET, POST, PUT,DELETEand HEAD.
+|params a|
+The names and values of request parameters. This property can be omitted if all request parameters are to be matched, but will restrict access only to the values provided if defined.
+
+For example, this property could be used to limit the actions a role is allowed to perform with the Collections API. If the role should only be allowed to perform the LIST or CLUSTERSTATUS requests, you would define this as follows:
+
+[source,json]
+----
+"params": {
+   "action": ["LIST", "CLUSTERSTATUS"]
+}
+----
+
+The value of the parameter can be a simple string or it could be a regular expression. Use the prefix `REGEX:` to use a regular expression match instead of a string identity match
+
+If the commands LIST and CLUSTERSTATUS are case insensitive, the above example should be as follows
+
+[source,json]
+----
+"params": {
+   "action": ["REGEX:(?i)LIST", "REGEX:(?i)CLUSTERSTATUS"]
+}
+----
+
+|before |This property allows ordering of permissions. The value of this property is the index of the permission that this new permission should be placed before in `security.json`. The index is automatically assigned in the order they are created
+|role |The name of the role(s) to give this permission. This name will be used to map user IDs to the role to grant these permissions. The value can be wildcard such as (`*`), which means that any user is OK, but no user is NOT OK.
+|===
+
+The following creates a new permission named "collection-mgr" that is allowed to create and list collections. The permission will be placed before the "read" permission. Note also that we have defined "collection as `null`, this is because requests to the Collections API are never collection-specific.
+
+[source,bash]
+----
+curl --user solr:SolrRocks -H 'Content-type:application/json' -d '{
+  "set-permission": {"collection": null,
+                     "path":"/admin/collections",
+                     "params":{"action":[LIST, CREATE]},
+                     "before: 3,
+                     "role": "admin"}
+}' http://localhost:8983/solr/admin/authorization
+----
+
+Apply an update permission on all collections to a role called `dev` and read permissions to a role called `guest`:
+
+[source,bash]
+----
+curl --user solr:SolrRocks -H 'Content-type:application/json' -d '{
+  "set-permission": {"name": "update, "role":"dev"},
+  "set-permission": {"name": "read, "role":"guest"},
+}' http://localhost:8983/solr/admin/authorization
+----
+
+[[Rule-BasedAuthorizationPlugin-UpdateorDeletePermissions]]
+=== Update or Delete Permissions
+
+Permissions can be accessed using their index in the list. Use the `/admin/authorization` API to see the existing permissions and their indices.
+
+The following example updates the '`role`' attribute of permission at index '`3`':
+
+[source,bash]
+----
+curl --user solr:SolrRocks -H 'Content-type:application/json' -d '{
+  "update-permission": {"index": 3,
+                       "role": ["admin", "dev"]}
+}' http://localhost:8983/solr/admin/authorization
+----
+
+The following example deletes permission at index '`3`':
+
+[source,bash]
+----
+curl --user solr:SolrRocks -H 'Content-type:application/json' -d '{
+  "delete-permission": 3
+}' http://localhost:8983/solr/admin/authorization
+----
+
+[[Rule-BasedAuthorizationPlugin-MapRolestoUsers]]
+=== Map Roles to Users
+
+A single command allows roles to be mapped to users:
+
+* `set-user-role`: map a user to a permission.
+
+To remove a user's permission, you should set the role to `null`. There is no command to delete a user role.
+
+The values supplied to the command are simply a user ID and one or more roles the user should have.
+
+For example, the following would grant a user "solr" the "admin" and "dev" roles, and remove all roles from the user ID "harry":
+
+[source,bash]
+----
+curl -u solr:SolrRocks -H 'Content-type:application/json' -d '{
+   "set-user-role" : {"solr": ["admin","dev"],
+                      "harry": null}
+}' http://localhost:8983/solr/admin/authorization
+----

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/rule-based-replica-placement.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/rule-based-replica-placement.adoc b/solr/solr-ref-guide/src/rule-based-replica-placement.adoc
new file mode 100644
index 0000000..c069cf3
--- /dev/null
+++ b/solr/solr-ref-guide/src/rule-based-replica-placement.adoc
@@ -0,0 +1,181 @@
+= Rule-based Replica Placement
+:page-shortname: rule-based-replica-placement
+:page-permalink: rule-based-replica-placement.html
+
+When Solr needs to assign nodes to collections, it can either automatically assign them randomly or the user can specify a set of nodes where it should create the replicas.
+
+With very large clusters, it is hard to specify exact node names and it still does not give you fine grained control over how nodes are chosen for a shard. The user should be in complete control of where the nodes are allocated for each collection, shard and replica. This helps to optimally allocate hardware resources across the cluster.
+
+Rule-based replica assignment allows the creation of rules to determine the placement of replicas in the cluster. In the future, this feature will help to automatically add or remove replicas when systems go down, or when higher throughput is required. This enables a more hands-off approach to administration of the cluster.
+
+This feature is used in the following instances:
+
+* Collection creation
+* Shard creation
+* Replica creation
+* Shard splitting
+
+[[Rule-basedReplicaPlacement-CommonUseCases]]
+== Common Use Cases
+
+There are several situations where this functionality may be used. A few of the rules that could be implemented are listed below:
+
+* Don’t assign more than 1 replica of this collection to a host.
+* Assign all replicas to nodes with more than 100GB of free disk space or, assign replicas where there is more disk space.
+* Do not assign any replica on a given host because I want to run an overseer there.
+* Assign only one replica of a shard in a rack.
+* Assign replica in nodes hosting less than 5 cores.
+* Assign replicas in nodes hosting the least number of cores.
+
+[[Rule-basedReplicaPlacement-RuleConditions]]
+== Rule Conditions
+
+A rule is a set of conditions that a node must satisfy before a replica core can be created there.
+
+There are three possible conditions.
+
+* *shard*: this is the name of a shard or a wild card (* means for all shards). If shard is not specified, then the rule applies to the entire collection.
+* *replica*: this can be a number or a wild-card (* means any number zero to infinity).
+* *tag*: this is an attribute of a node in the cluster that can be used in a rule, e.g., “freedisk”, “cores”, “rack”, “dc”, etc. The tag name can be a custom string. If creating a custom tag, a snitch is responsible for providing tags and values. The section <<Rule-basedReplicaPlacement-Snitches,Snitches>> below describes how to add a custom tag, and defines six pre-defined tags (cores, freedisk, host, port, node, and sysprop).
+
+[[Rule-basedReplicaPlacement-RuleOperators]]
+=== Rule Operators
+
+A condition can have one of the following operators to set the parameters for the rule.
+
+* *equals (no operator required)*: `tag:x` means tag value must be equal to ‘x’
+* *greater than (>)*: `tag:>x` means tag value greater than ‘x’. x must be a number
+* *less than (<)*: `tag:<x` means tag value less than ‘x’. x must be a number
+* *not equal (!)*: `tag:!x` means tag value MUST NOT be equal to ‘x’. The equals check is performed on String value
+
+
+[[Rule-basedReplicaPlacement-FuzzyOperator_]]
+=== Fuzzy Operator (~)
+
+This can be used as a suffix to any condition. This would first try to satisfy the rule strictly. If Solr can’t find enough nodes to match the criterion, it tries to find the next best match which may not satisfy the criterion. For example, if we have a rule such as, `freedisk:>200~`, Solr will try to assign replicas of this collection on nodes with more than 200GB of free disk space. If that is not possible, the node which has the most free disk space will be chosen instead.
+
+[[Rule-basedReplicaPlacement-ChoosingAmongEquals]]
+=== Choosing Among Equals
+
+The nodes are sorted first and the rules are used to sort them. This ensures that even if many nodes match the rules, the best nodes are picked up for node assignment. For example, if there is a rule such as `freedisk:>20`, nodes are sorted first on disk space descending and the node with the most disk space is picked up first. Or, if the rule is `cores:<5`, nodes are sorted with number of cores ascending and the node with the least number of cores is picked up first.
+
+[[Rule-basedReplicaPlacement-Rulesfornewshards]]
+== Rules for new shards
+
+The rules are persisted along with collection state. So, when a new replica is created, the system will assign replicas satisfying the rules. When a new shard is created as a result of using the Collection API's <<collections-api.adoc#CollectionsAPI-createshard,CREATESHARD command>>, ensure that you have created rules specific for that shard name. Rules can be altered using the <<collections-api.adoc#CollectionsAPI-modifycollection,MODIFYCOLLECTION command>>. However, it is not required to do so if the rules do not specify explicit shard names. For example, a rule such as `shard:shard1,replica:*,ip_3:168:`, will not apply to any new shard created. But, if your rule is `replica:*,ip_3:168`, then it will apply to any new shard created.
+
+The same is applicable to shard splitting. Shard splitting is treated exactly the same way as shard creation. Even though `shard1_1` and `shard1_2` may be created from `shard1`, the rules treat them as distinct, unrelated shards.
+
+[[Rule-basedReplicaPlacement-Snitches]]
+== Snitches
+
+Tag values come from a plugin called Snitch. If there is a tag named ‘rack’ in a rule, there must be Snitch which provides the value for ‘rack’ for each node in the cluster. A snitch implements the Snitch interface. Solr, by default, provides a default snitch which provides the following tags:
+
+* *cores*: Number of cores in the node
+* *freedisk*: Disk space available in the node
+* *host*: host name of the node
+* *port*: port of the node
+* *node*: node name
+* *role* : The role of the node. The only supported role is 'overseer'
+* *ip_1, ip_2, ip_3, ip_4*: These are ip fragments for each node. For example, in a host with ip `192.168.1.2`, `ip_1 = 2`, `ip_2 =1`, `ip_3 = 168` and` ip_4 = 192`
+* *sysprop.{PROPERTY_NAME}*: These are values available from system properties. `sysprop.key` means a value that is passed to the node as `-Dkey=keyValue` during the node startup. It is possible to use rules like `sysprop.key:expectedVal,shard:*`
+
+[[Rule-basedReplicaPlacement-HowSnitchesareConfigured]]
+=== How Snitches are Configured
+
+It is possible to use one or more snitches for a set of rules. If the rules only need tags from default snitch it need not be explicitly configured. For example:
+
+[source,text]
+----
+snitch=class:fqn.ClassName,key1:val1,key2:val2,key3:val3
+----
+
+*How Tag Values are Collected*
+
+. Identify the set of tags in the rules
+. Create instances of Snitches specified. The default snitch is always created.
+. Ask each Snitch if it can provide values for the any of the tags. If even one tag does not have a snitch, the assignment fails.
+. After identifying the Snitches, they provide the tag values for each node in the cluster.
+. If the value for a tag is not obtained for a given node, it cannot participate in the assignment.
+
+[[Rule-basedReplicaPlacement-Examples]]
+== Examples
+
+
+[[Rule-basedReplicaPlacement-Keeplessthan2replicas_atmost1replica_ofthiscollectiononanynode]]
+=== Keep less than 2 replicas (at most 1 replica) of this collection on any node
+
+For this rule, we define the `replica` condition with operators for "less than 2", and use a pre-defined tag named `node` to define nodes with any name.
+
+[source,text]
+----
+replica:<2,node:*
+// this is equivalent to replica:<2,node:*,shard:**. We can omit shard:** because ** is the default value of shard
+----
+
+
+[[Rule-basedReplicaPlacement-Foragivenshard_keeplessthan2replicasonanynode]]
+=== For a given shard, keep less than 2 replicas on any node
+
+For this rule, we use the `shard` condition to define any shard , the `replica` condition with operators for "less than 2", and finally a pre-defined tag named `node` to define nodes with any name.
+
+[source,text]
+----
+shard:*,replica:<2,node:*
+----
+
+[[Rule-basedReplicaPlacement-Assignallreplicasinshard1torack730]]
+=== Assign all replicas in shard1 to rack 730
+
+This rule limits the `shard` condition to 'shard1', but any number of replicas. We're also referencing a custom tag named `rack`. Before defining this rule, we will need to configure a custom Snitch which provides values for the tag `rack`.
+
+[source,text]
+----
+shard:shard1,replica:*,rack:730
+----
+
+In this case, the default value of `replica` is * (or, all replicas). So, it can be omitted and the rule can be reduced to:
+
+[source,text]
+----
+shard:shard1,rack:730
+----
+
+[[Rule-basedReplicaPlacement-Createreplicasinnodeswithlessthan5coresonly]]
+=== Create replicas in nodes with less than 5 cores only
+
+This rule uses the `replica` condition to define any number of replicas, but adds a pre-defined tag named `core` and uses operators for "less than 5".
+
+[source,text]
+----
+replica:*,cores:<5
+----
+
+Again, we can simplify this to use the default value for `replica`, like so:
+
+[source,text]
+----
+cores:<5
+----
+
+[[Rule-basedReplicaPlacement-Donotcreateanyreplicasinhost192.45.67.3]]
+=== Do not create any replicas in host 192.45.67.3
+
+This rule uses only the pre-defined tag `host` to define an IP address where replicas should not be placed.
+
+[source,text]
+----
+host:!192.45.67.3
+----
+
+[[Rule-basedReplicaPlacement-DefiningRules]]
+== Defining Rules
+
+Rules are specified per collection during collection creation as request parameters. It is possible to specify multiple ‘rule’ and ‘snitch’ params as in this example:
+
+[source,text]
+----
+snitch=class:EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:<2,dc:dc3
+----
+
+These rules are persisted in `clusterstate.json` in ZooKeeper and are available throughout the lifetime of the collection. This enables the system to perform any future node allocation without direct user interaction. The rules added during collection creation can be modified later using the <<collections-api.adoc#CollectionsAPI-modifycollection,MODIFYCOLLECTION>> API.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/running-solr-on-hdfs.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/running-solr-on-hdfs.adoc b/solr/solr-ref-guide/src/running-solr-on-hdfs.adoc
new file mode 100644
index 0000000..4e51446
--- /dev/null
+++ b/solr/solr-ref-guide/src/running-solr-on-hdfs.adoc
@@ -0,0 +1,224 @@
+= Running Solr on HDFS
+:page-shortname: running-solr-on-hdfs
+:page-permalink: running-solr-on-hdfs.html
+
+Solr has support for writing and reading its index and transaction log files to the HDFS distributed filesystem.
+
+This does not use Hadoop MapReduce to process Solr data, rather it only uses the HDFS filesystem for index and transaction log file storage. To use Hadoop MapReduce to process Solr data, see the MapReduceIndexerTool in the Solr contrib area.
+
+To use HDFS rather than a local filesystem, you must be using Hadoop 2.x and you will need to instruct Solr to use the `HdfsDirectoryFactory`. There are also several additional parameters to define. These can be set in one of three ways:
+
+* Pass JVM arguments to the `bin/solr` script. These would need to be passed every time you start Solr with `bin/solr`.
+* Modify `solr.in.sh` (or `solr.in.cmd` on Windows) to pass the JVM arguments automatically when using `bin/solr` without having to set them manually.
+* Define the properties in `solrconfig.xml`. These configuration changes would need to be repeated for every collection, so is a good option if you only want some of your collections stored in HDFS.
+
+[[RunningSolronHDFS-StartingSolronHDFS]]
+== Starting Solr on HDFS
+
+[[RunningSolronHDFS-StandaloneSolrInstances]]
+=== Standalone Solr Instances
+
+For standalone Solr instances, there are a few parameters you should be sure to modify before starting Solr. These can be set in `solrconfig.xml`(more on that <<RunningSolronHDFS-HdfsDirectoryFactoryParameters,below>>), or passed to the `bin/solr` script at startup.
+
+* You need to use an `HdfsDirectoryFactory` and a data dir of the form `hdfs://host:port/path`
+* You need to specify an UpdateLog location of the form `hdfs://host:port/path`
+* You should specify a lock factory type of '`hdfs`' or none.
+
+If you do not modify `solrconfig.xml`, you can instead start Solr on HDFS with the following command:
+
+[source,bash]
+----
+bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory
+     -Dsolr.lock.type=hdfs
+     -Dsolr.data.dir=hdfs://host:port/path
+     -Dsolr.updatelog=hdfs://host:port/path
+----
+
+This example will start Solr in standalone mode, using the defined JVM properties (explained in more detail <<RunningSolronHDFS-HdfsDirectoryFactoryParameters,below>>).
+
+[[RunningSolronHDFS-SolrCloudInstances]]
+=== SolrCloud Instances
+
+In SolrCloud mode, it's best to leave the data and update log directories as the defaults Solr comes with and simply specify the `solr.hdfs.home`. All dynamically created collections will create the appropriate directories automatically under the `solr.hdfs.home` root directory.
+
+* Set `solr.hdfs.home` in the form `hdfs://host:port/path`
+* You should specify a lock factory type of '`hdfs`' or none.
+
+[source,bash]
+----
+bin/solr start -c -Dsolr.directoryFactory=HdfsDirectoryFactory
+     -Dsolr.lock.type=hdfs
+     -Dsolr.hdfs.home=hdfs://host:port/path
+----
+
+This command starts Solr in SolrCloud mode, using the defined JVM properties.
+
+
+[[RunningSolronHDFS-Modifyingsolr.in.sh_nix_orsolr.in.cmd_Windows_]]
+=== Modifying solr.in.sh (*nix) or solr.in.cmd (Windows)
+
+The examples above assume you will pass JVM arguments as part of the start command every time you use `bin/solr` to start Solr. However, `bin/solr` looks for an include file named `solr.in.sh` (`solr.in.cmd` on Windows) to set environment variables. By default, this file is found in the `bin` directory, and you can modify it to permanently add the `HdfsDirectoryFactory` settings and ensure they are used every time Solr is started.
+
+For example, to set JVM arguments to always use HDFS when running in SolrCloud mode (as shown above), you would add a section such as this:
+
+[source,bash]
+----
+# Set HDFS DirectoryFactory & Settings
+-Dsolr.directoryFactory=HdfsDirectoryFactory \
+-Dsolr.lock.type=hdfs \
+-Dsolr.hdfs.home=hdfs://host:port/path \
+----
+
+[[RunningSolronHDFS-TheBlockCache]]
+== The Block Cache
+
+For performance, the HdfsDirectoryFactory uses a Directory that will cache HDFS blocks. This caching mechanism is meant to replace the standard file system cache that Solr utilizes so much. By default, this cache is allocated off heap. This cache will often need to be quite large and you may need to raise the off heap memory limit for the specific JVM you are running Solr in. For the Oracle/OpenJDK JVMs, the follow is an example command line parameter that you can use to raise the limit when starting Solr:
+
+[source,bash]
+----
+-XX:MaxDirectMemorySize=20g
+----
+
+[[RunningSolronHDFS-HdfsDirectoryFactoryParameters]]
+== HdfsDirectoryFactory Parameters
+
+The `HdfsDirectoryFactory` has a number of settings that are defined as part of the `directoryFactory` configuration.
+
+[[RunningSolronHDFS-SolrHDFSSettings]]
+=== Solr HDFS Settings
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,30,10,40",options="header"]
+|===
+|Parameter |Example Value |Default |Description
+|`solr.hdfs.home` |`hdfs://host:port/path/solr` |N/A |A root location in HDFS for Solr to write collection data to. Rather than specifying an HDFS location for the data directory or update log directory, use this to specify one root location and have everything automatically created within this HDFS location.
+|===
+
+[[RunningSolronHDFS-BlockCacheSettings]]
+=== Block Cache Settings
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,10,60",options="header"]
+|===
+|Parameter |Default |Description
+|`solr.hdfs.blockcache.enabled` |true |Enable the blockcache
+|`solr.hdfs.blockcache.read.enabled` |true |Enable the read cache
+|`solr.hdfs.blockcache.direct.memory.allocation` |true |Enable direct memory allocation. If this is false, heap is used
+|`solr.hdfs.blockcache.slab.count` |1 |Number of memory slabs to allocate. Each slab is 128 MB in size.
+|`solr.hdfs.blockcache.global` |true |Enable/Disable using one global cache for all SolrCores. The settings used will be from the first HdfsDirectoryFactory created.
+|===
+
+[[RunningSolronHDFS-NRTCachingDirectorySettings]]
+=== NRTCachingDirectory Settings
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,10,60",options="header"]
+|===
+|Parameter |Default |Description
+|`solr.hdfs.nrtcachingdirectory.enable` |true |Enable the use of NRTCachingDirectory
+|`solr.hdfs.nrtcachingdirectory.maxmergesizemb` |16 |NRTCachingDirectory max segment size for merges
+|`solr.hdfs.nrtcachingdirectory.maxcachedmb` |192 |NRTCachingDirectory max cache size
+|===
+
+[[RunningSolronHDFS-HDFSClientConfigurationSettings]]
+=== HDFS Client Configuration Settings
+
+solr.hdfs.confdir pass the location of HDFS client configuration files - needed for HDFS HA for example.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,10,60",options="header"]
+|===
+|Parameter |Default |Description
+|`solr.hdfs.confdir` |N/A |Pass the location of HDFS client configuration files - needed for HDFS HA for example.
+|===
+
+[[RunningSolronHDFS-KerberosAuthenticationSettings]]
+=== Kerberos Authentication Settings
+
+Hadoop can be configured to use the Kerberos protocol to verify user identity when trying to access core services like HDFS. If your HDFS directories are protected using Kerberos, then you need to configure Solr's HdfsDirectoryFactory to authenticate using Kerberos in order to read and write to HDFS. To enable Kerberos authentication from Solr, you need to set the following parameters:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,10,60",options="header"]
+|===
+|Parameter |Default |Description
+|`solr.hdfs.security.kerberos.enabled` |false |Set to true to enable Kerberos authentication
+|`solr.hdfs.security.kerberos.keytabfile` |N/A a|
+A keytab file contains pairs of Kerberos principals and encrypted keys which allows for password-less authentication when Solr attempts to authenticate with secure Hadoop.
+
+This file will need to be present on all Solr servers at the same path provided in this parameter.
+
+|`solr.hdfs.security.kerberos.principal` |N/A |The Kerberos principal that Solr should use to authenticate to secure Hadoop; the format of a typical Kerberos V5 principal is: `primary/instance@realm`
+|===
+
+[[RunningSolronHDFS-Example]]
+== Example
+
+Here is a sample `solrconfig.xml` configuration for storing Solr indexes on HDFS:
+
+[source,xml]
+----
+<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
+  <str name="solr.hdfs.home">hdfs://host:port/solr</str>
+  <bool name="solr.hdfs.blockcache.enabled">true</bool>
+  <int name="solr.hdfs.blockcache.slab.count">1</int>
+  <bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool>
+  <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
+  <bool name="solr.hdfs.blockcache.read.enabled">true</bool>
+  <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
+  <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
+  <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
+</directoryFactory>
+----
+
+If using Kerberos, you will need to add the three Kerberos related properties to the `<directoryFactory>` element in solrconfig.xml, such as:
+
+[source,xml]
+----
+<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
+   ...
+  <bool name="solr.hdfs.security.kerberos.enabled">true</bool>
+  <str name="solr.hdfs.security.kerberos.keytabfile">/etc/krb5.keytab</str>
+  <str name="solr.hdfs.security.kerberos.principal">solr/admin@KERBEROS.COM</str>
+</directoryFactory>
+----
+
+[[RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud]]
+== Automatically Add Replicas in SolrCloud
+
+One benefit to running Solr in HDFS is the ability to automatically add new replicas when the Overseer notices that a shard has gone down. Because the "gone" index shards are stored in HDFS, the a new core will be created and the new core will point to the existing indexes in HDFS.
+
+Collections created using `autoAddReplicas=true` on a shared file system have automatic addition of replicas enabled. The following settings can be used to override the defaults in the `<solrcloud>` section of `solr.xml`.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="40,10,50",options="header"]
+|===
+|Param |Default |Description
+|autoReplicaFailoverWorkLoopDelay |10000 |The time (in ms) between clusterstate inspections by the Overseer to detect and possibly act on creation of a replacement replica.
+|autoReplicaFailoverWaitAfterExpiration |30000 |The minimum time (in ms) to wait for initiating replacement of a replica after first noticing it not being live. This is important to prevent false positives while stoping or starting the cluster.
+|autoReplicaFailoverBadNodeExpiration |60000 |The delay (in ms) after which a replica marked as down would be unmarked.
+|===
+
+[[RunningSolronHDFS-TemporarilydisableautoAddReplicasfortheentirecluster]]
+=== Temporarily disable autoAddReplicas for the entire cluster
+
+When doing offline maintenance on the cluster and for various other use cases where an admin would like to temporarily disable auto addition of replicas, the following APIs will disable and re-enable autoAddReplicas for **all collections in the cluster**:
+
+Disable auto addition of replicas cluster wide by setting the cluster property `autoAddReplicas` to `false`:
+
+[source,text]
+----
+http://localhost:8983/solr/admin/collections?action=CLUSTERPROP&name=autoAddReplicas&val=false
+----
+
+Re-enable auto addition of replicas (for those collections created with autoAddReplica=true) by unsetting the `autoAddReplicas` cluster property (when no `val` param is provided, the cluster property is unset):
+
+[source,text]
+----
+http://localhost:8983/solr/admin/collections?action=CLUSTERPROP&name=autoAddReplicas
+----


Mime
View raw message