lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrCloud" by YonikSeeley
Date Thu, 03 Jan 2013 15:49:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCloud" page has been changed by YonikSeeley:
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=84&rev2=85

Comment:
take a pass at cleaning up some of the terminology

  cd example2B
  java -Djetty.port=7500 -DzkHost=localhost:9983 -jar start.jar
  }}}
- Refresh the zookeeper browser page [[http://localhost:8983/solr/#/~cloud|Solr Zookeeper
Admin UI]] and verify that 4 solr nodes are up, and that each shard is present at 2 nodes.
+ Refresh the zookeeper browser page [[http://localhost:8983/solr/#/~cloud|Solr Zookeeper
Admin UI]] and verify that 4 solr nodes are up, and that each shard has two replicas.
  
- Because we have been telling Solr that we want two logical shards, starting instances 3
and 4 are assigned to be replicas of instances one and two automatically.
+ Because we have been telling Solr that we want two logical shards, starting instances 3
and 4 are assigned to be additional replicas of those shards automatically.
  
  Now send a query to any of the servers to query the cluster:
  
  http://localhost:7500/solr/collection1/select?q=*:*
  
- Send this query multiple times and observe the logs from the solr servers.  From your web
browser, you may need to hold down CTRL while clicking on the browser refresh button to bypass
the HTTP caching in your browser.  You should be able to observe Solr load balancing the requests
(done via LBHttpSolrServer ?) across shard replicas, using different servers to satisfy each
request.  There will be a log statement for the top-level request in the server the browser
sends the request to, and then a log statement for each sub-request that are merged to produce
the complete response.
+ Send this query multiple times and observe the logs from the solr servers. You should be
able to observe Solr load balancing the requests (done via LBHttpSolrServer ?) across replicas,
using different servers to satisfy each request.  There will be a log statement for the top-level
request in the server the browser sends the request to, and then a log statement for each
sub-request that are merged to produce the complete response.
  
  To demonstrate fail-over for high availability, press CTRL-C in the window running any one
of the Solr servers '''except the instance running ZooKeeper'''.  (We'll talk about ZooKeeper
redundancy in Example C.)  Once that server instance terminates, send another query request
to any of the remaining servers that are up.  You should continue to see the full results.
  
  SolrCloud can continue to serve results without interruption as long as at least one server
hosts every shard.  You can demonstrate this by judiciously shutting down various instances
and looking for results.  If you have killed all of the servers for a particular shard, requests
to other servers will result in a 503 error.  To return just the documents that are available
in the shards that are still alive (and avoid the error), add the following query parameter:
shards.tolerant=true
  
- SolrCloud uses leaders and an overseer as an implementation detail. This means that some
shards/replicas will play special roles. You don't need to worry if the instance you kill
is a leader or the cluster overseer - if you happen to kill one of these, automatic fail over
will choose new leaders or a new overseer transparently to the user and they will seamlessly
takeover their respective jobs. Any Solr instance can be promoted to one of these roles.
+ SolrCloud uses leaders and an overseer as an implementation detail. This means that some
nodes/replicas will play special roles. You don't need to worry if the instance you kill is
a leader or the cluster overseer - if you happen to kill one of these, automatic fail over
will choose new leaders or a new overseer transparently to the user and they will seamlessly
takeover their respective jobs. Any Solr instance can be promoted to one of these roles.
  
  === Example C: Two shard cluster with shard replicas and zookeeper ensemble ===
  {{http://people.apache.org/~markrmiller/2shard4server2.jpg}}
@@ -166, +166 @@

  
  About the params
   * '''name''': The name of the collection to be created
-  * '''numShards''': The number of shards (sometimes called slices) to be created as part
of the collection
+  * '''numShards''': The number of logical shards (sometimes called slices) to be created
as part of the collection
-  * '''replicationFactor''': The number of "additional" shard-replica (sometimes called shards)
to be created for each shard. Set it to 0 to have "one shard-replica for each of your shards".
Set to 1 to have "two shard-replica for each of your shards" etc. With a value of 0 your data
will not be replicated
+  * '''replicationFactor''': The number of copies of each document (or, the number of physical
replicas to be created for each logical shard of the collection.)  A replicationFactor of
3 means that there will be 3 replicas (one of which is normally designated to be the leader)
for each logical shard.  NOTE: in Solr 4.0, replicationFactor was the number of *additional*
copies as opposed to the total number of copies.
-  * '''maxShardsPerNode''' : A create operation will spread numShards*(replicationFactor+1)
shard-replica across your live Solr nodes - fairly distributed, and never two shard-replica
of the same shard on the same Solr node. If a Solr is not live at the point in time where
the create operation is carried out, it will not get any shard-replica of the new collection.
To prevent too many shard-replica being created on a single Solr node, use maxShardsPerNode
to set a limit for how many shard-replica the create operation is allowed to create on each
node - default is 1. If it cannot fit the entire collection (numShards*(replicationFactor+1)
shard-replica) on you live Solrs it will not create anything at all.
+  * '''maxShardsPerNode''' : A create operation will spread numShards*replicationFactor shard-replica
across your live Solr nodes - fairly distributed, and never two replica of the same shard
on the same Solr node. If a Solr is not live at the point in time where the create operation
is carried out, it will not get any parts of the new collection. To prevent too many replica
being created on a single Solr node, use maxShardsPerNode to set a limit for how many replicas
the create operation is allowed to create on each node - default is 1. If it cannot fit the
entire collection numShards*replicationFactor replicas on you live Solrs it will not create
anything at all.
   * '''createNodeSet''': If not provided the create operation will create shard-replica spread
across all of your live Solr nodes. You can provide the "createNodeSet" parameter to change
the set of nodes to spread the shard-replica across. The format of values for this param is
"<node-name1>,<node-name2>,...,<node-nameN>" - e.g. "localhost:8983_solr,localhost:8984_solr,localhost:8985_solr"
  
- Note: replicationFactor defines the maximum number of replicas created in addition to the
leader from amongst the nodes currently running (i.e. nodes added later will not be used for
this collection). Imagine you have a cluster with 20 nodes and want to add an additional smaller
collection to your installation with 2 shards, each shard with a leader and two replicas.
You would specify a replicationFactor=2. Now six of your nodes will host this new collection
and the other 14 will not host the new collection.
  
  Delete http://localhost:8983/solr/admin/collections?action=DELETE&name=mycollection
  
@@ -376, +375 @@

  The Grouping feature only works if groups are in the same shard. Proper support will require
custom hashing and there is already a JIRA issue working towards this.
  
  == Glossary ==
- ||'''Collection''': ||A single search index. ||
+ ||'''Collection''': ||A single search index.||
- ||'''Shard''': ||Either a logical or physical section of a single index depending on context.
A logical section is also called a slice. A physical shard is expressed as a SolrCore. ||
- ||'''Slice''': ||A logical section of a single index. One or more identical, physical shards
make up a slice. ||
+ ||'''Shard''': ||A logical section of a single collection (also called Slice). Sometimes
people will talk about "Shard" in a physical sense (a manifestation of a logical shard) ||
+ ||'''Replica''': ||A physical manifestation of a logical Shard, implemented as a single
Lucene index on a SolrCore ||
+ ||'''Leader''': ||One Replica of every Shard will be designated as a Leader to coordinate
indexing for that Shard||
  ||'''SolrCore''': ||Encapsulates a single physical index. One or more make up logical shards
(or slices) which make up a collection. ||
  ||'''Node''': ||A single instance of Solr. A single Solr instance can have multiple SolrCores
that can be part of any number of collections. ||
  ||'''Cluster''': ||All of the nodes you are using to host SolrCores. ||

Mime
View raw message