lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ctarg...@apache.org
Subject [lucene-solr] branch branch_8_0 updated: SOLR-12770: make docs on shards param a little more clear, fix a couple typos
Date Thu, 14 Feb 2019 20:57:22 GMT
This is an automated email from the ASF dual-hosted git repository.

ctargett pushed a commit to branch branch_8_0
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/branch_8_0 by this push:
     new 05b002e  SOLR-12770: make docs on shards param a little more clear, fix a couple
typos
05b002e is described below

commit 05b002e7b0a3404c2cc03fbcab7d16c695faa884
Author: Cassandra Targett <ctargett@apache.org>
AuthorDate: Thu Feb 14 14:55:10 2019 -0600

    SOLR-12770: make docs on shards param a little more clear, fix a couple typos
---
 solr/solr-ref-guide/src/distributed-requests.adoc  | 36 ++++++++++++++--------
 .../distributed-search-with-index-sharding.adoc    |  5 +--
 solr/solr-ref-guide/src/the-terms-component.adoc   |  8 +++--
 3 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/solr/solr-ref-guide/src/distributed-requests.adoc b/solr/solr-ref-guide/src/distributed-requests.adoc
index b5246f2..be129ed 100644
--- a/solr/solr-ref-guide/src/distributed-requests.adoc
+++ b/solr/solr-ref-guide/src/distributed-requests.adoc
@@ -22,46 +22,48 @@ The chosen replica acts as an aggregator: it creates internal requests
to random
 
 == Limiting Which Shards are Queried
 
-While one of the advantages of using SolrCloud is the ability to query very large collections
distributed among various shards, in some cases <<shards-and-indexing-data-in-solrcloud.adoc#document-routing,you
may know that you are only interested in results from a subset of your shards>>. You
have the option of searching over all of your data or just parts of it.
+While one of the advantages of using SolrCloud is the ability to query very large collections
distributed across various shards, in some cases you may have configured Solr so you know
<<shards-and-indexing-data-in-solrcloud.adoc#document-routing,you are only interested
in results from a specific subset of shards>>. You have the option of searching over
all of your data or just parts of it.
 
-Querying all shards for a collection should look familiar; it's as though SolrCloud didn't
even come into play:
+A query across all shards for a collection is simply a query that does not define a `shards`
parameter:
 
 [source,text]
 ----
 http://localhost:8983/solr/gettingstarted/select?q=*:*
 ----
 
-If, on the other hand, you wanted to search just one shard, you can specify that shard by
its logical ID, as in:
+If you want to search just one shard, use the `shards` parameter to specify the shard by
its logical ID, as in:
 
 [source,text]
 ----
 http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=shard1
 ----
 
-If you want to search a group of shard Ids, you can specify them together:
+If you want to search a group of shards, you can specify each shard separated by a comma
in one request:
 
 [source,text]
 ----
 http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=shard1,shard2
 ----
 
-In both of the above examples, the shard Id(s) will be used to pick a random replica of that
shard.
+In both of the above examples, while only the specific shards are queried, any random replica
of the shard will get the request.
 
-Alternatively, you can specify the explicit replicas you wish to use in place of a shard
Ids:
+Alternatively, you can specify a list of replicas you wish to use in place of a shard IDs
by separating the replica IDs with commas:
 
 [source,text]
 ----
 http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=localhost:7574/solr/gettingstarted,localhost:8983/solr/gettingstarted
 ----
 
-Or you can specify a list of replicas to choose from for a single shard (for load balancing
purposes) by using the pipe symbol (|):
+Or you can specify a list of replicas to choose from for a single shard (for load balancing
purposes) by using the pipe symbol (|) between different replica IDs:
 
 [source,text]
 ----
 http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=localhost:7574/solr/gettingstarted|localhost:7500/solr/gettingstarted
 ----
 
-And of course, you can specify a list of shards (separated by commas) each defined by a list
of replicas (seperated by pipes). In this example, 2 shards are queried, the first being a
random replica from shard1, the second being a random replica from the explicit pipe delimited
list:
+Finally, you can specify a list of shards (separated by commas) each defined by a list of
replicas (seperated by pipes).
+
+In the following example, 2 shards are queried, the first being a random replica from shard1,
the second being a random replica from the explicit pipe delimited list:
 
 [source,text]
 ----
@@ -70,9 +72,11 @@ http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=shard1,localhost:7
 
 == Configuring the ShardHandlerFactory
 
-You can directly configure aspects of the concurrency and thread-pooling used within distributed
search in Solr. This allows for finer grained control and you can tune it to target your own
specific requirements. The default configuration favors throughput over latency.
+For finer-grained control, you can directly configure and tune aspects of the concurrency
and thread-pooling used within distributed search in Solr. The default configuration favors
throughput over latency.
+
+This is done by defining a `shardHandler` in the configuration for your search handler.
 
-To configure the standard search handler, provide a configuration like this in `solrconfig.xml`:
+To add a `shardHandler` to the standard search handler, provide a configuration in `solrconfig.xml`,
as in this example:
 
 [source,xml]
 ----
@@ -112,10 +116,16 @@ If specified, the thread pool will use a backing queue instead of a
direct hando
 Chooses the JVM specifics dealing with fair policy queuing, if enabled distributed searches
will be handled in a First in First out fashion at a cost to throughput. If disabled throughput
will be favored over latency. The default is `false`.
 
 `shardsWhitelist`::
-If specified, this lists limits what nodes can be requested in the `shards` request parameter.
In cloud mode this whitelist is automatically configured to include all live nodes in the
cluster. In standalone mode the whitelist defaults to empty (sharding not allowed). If you
need to disable this feature for backwards compatibility, you can set the system property
`solr.disable.shardsWhitelist=true`. The value of this parameter is a comma separated list
of the nodes that will be whitelist [...]
+If specified, this lists limits what nodes can be requested in the `shards` request parameter.
++
+In SolrCloud mode this whitelist is automatically configured to include all live nodes in
the cluster.
++
+In standalone mode the whitelist defaults to empty (sharding not allowed).
++
+If you need to disable this feature for backwards compatibility, you can set the system property
`solr.disable.shardsWhitelist=true`. The value of this parameter is a comma separated list
of the nodes that will be whitelisted, i.e.:
 `10.0.0.1:8983/solr,10.0.0.1:8984/solr`.
-
-NOTE: In cloud mode, if at least one node is included in the whitelist, then the live_nodes
will no longer be used as source for the list. This means that, if you need to do a cross-cluster
request using the `shards` parameter in cloud mode (in addition to regular within-cluster
requests), you'll need to add all nodes (local cluster + remote nodes) to the whitelist. 
++
+NOTE: In SolrCloud mode, if at least one node is included in the whitelist, then the `live_nodes`
will no longer be used as source for the list. This means that if you need to do a cross-cluster
request using the `shards` parameter in SolrCloud mode (in addition to regular within-cluster
requests), you'll need to add all nodes (local cluster + remote nodes) to the whitelist.
 
 == Configuring statsCache (Distributed IDF)
 
diff --git a/solr/solr-ref-guide/src/distributed-search-with-index-sharding.adoc b/solr/solr-ref-guide/src/distributed-search-with-index-sharding.adoc
index d8a2d76..f2c745f 100644
--- a/solr/solr-ref-guide/src/distributed-search-with-index-sharding.adoc
+++ b/solr/solr-ref-guide/src/distributed-search-with-index-sharding.adoc
@@ -60,14 +60,15 @@ The following components support distributed search:
 * The *Debug* component, which helps with debugging.
 
 === Shards Whitelist
-What nodes are allowed in the `shards` parameter is configurable through the `shardsWhitelist`
property in `solr.xml`. This whitelist is automatically configured for SolrCloud but needs
explicit configuration for master/slave mode. Read more details in <<distributed-requests.adoc#configuring-the-shardhandlerfactory>>.

+
+The nodes allowed in the `shards` parameter is configurable through the `shardsWhitelist`
property in `solr.xml`. This whitelist is automatically configured for SolrCloud but needs
explicit configuration for master/slave mode. Read more details in the section <<distributed-requests.adoc#configuring-the-shardhandlerfactory,Configuring
the ShardHandlerFactory>>.
 
 == Limitations to Distributed Search
 
 Distributed searching in Solr has the following limitations:
 
 * Each document indexed must have a unique key.
-* If Solr discovers duplicate document IDs, Solr selects the first document and discards
subsequent ones.
+* If Solr discovers duplicate document IDs, Solr selects the first document and discards
subsequent documents.
 * The index for distributed searching may become momentarily out of sync if a commit happens
between the first and second phase of the distributed search. This might cause a situation
where a document that once matched a query and was subsequently changed may no longer match
the query but will still be retrieved. This situation is expected to be quite rare, however,
and is only possible for a single query request.
 * The number of shards is limited by number of characters allowed for GET method's URI; most
Web servers generally support at least 4000 characters, but many servers limit URI length
to reduce their vulnerability to Denial of Service (DoS) attacks.
 * Shard information can be returned with each document in a distributed search by including
`fl=id, [shard]` in the search request. This returns the shard URL.
diff --git a/solr/solr-ref-guide/src/the-terms-component.adoc b/solr/solr-ref-guide/src/the-terms-component.adoc
index b2705b1..776ed80 100644
--- a/solr/solr-ref-guide/src/the-terms-component.adoc
+++ b/solr/solr-ref-guide/src/the-terms-component.adoc
@@ -292,8 +292,12 @@ The TermsComponent also supports distributed indexes. For the `/terms`
request h
 
 `shards`::
 Specifies the shards in your distributed indexing configuration. For more information about
distributed indexing, see <<distributed-search-with-index-sharding.adoc#distributed-search-with-index-sharding,Distributed
Search with Index Sharding>>.
++
+The `shards` parameter is subject to a host whitelist that has to be configured in the component's
parameters using the configuration key `shardsWhitelist` and the list of hosts as values.
++
+By default the whitelist will be populated with all live nodes when running in SolrCloud
mode. If you need to disable this feature for backwards compatibility, you can set the system
property `solr.disable.shardsWhitelist=true`.
++
+See the section <<distributed-requests.adoc#configuring-the-shardhandlerfactory,Configuring
the ShardHandlerFactory>> for more information about how the whitelist works. 
 
 `shards.qt`::
 Specifies the request handler Solr uses for requests to shards.
-
-Same as with regular distributed search, the `shards` parameter is subject to a host whitelist
that has to be configured in the component init parameters using the configuration key `shardsWhitelist`
and the list of hosts as values. In the same way as with distributed search, the whitelist
will be populated to all live nodes by default when running in SolrCloud mode. If you need
to disable this feature for backwards compatibility, you can set the system property `solr.disable.shardsWhite
[...]


Mime
View raw message