lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Trivial Update of "SolrCloud" by YonikSeeley
Date Tue, 02 Feb 2010 16:38:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCloud" page has been changed by YonikSeeley.
The comment on this change is: start embedded ensemble example.
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=22&rev2=23

--------------------------------------------------

  
  If you haven't yet, go through the simple [[http://lucene.apache.org/solr/tutorial.html|Solr
Tutorial]] to familiarize yourself with Solr.
  
- Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination
- think of it as a distributed filesystem.
+ Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination
- think of it as a distributed filesystem that contains information about all of the Solr
servers.
  
  Since we'll need two solr servers for this example, simply make a copy of the example directory
for the second server.
  
  {{{
  cp -r example example2
  }}}
- === Simple two shard cluster ===
+ === Example A: Simple two shard cluster ===
  This example simply creates a cluster consisting of two solr servers representing two different
shards of a collection.
  
  Since we'll need two solr servers for this example, simply make a copy of the example directory
for the second server.
@@ -75, +75 @@

  
  If at any point you wish to start over fresh or experiment with different configurations,
you can delete all of the cloud state contained within zookeeper by simply deleting the solr/zoo_data
directory after shutting down the servers.
  
- === Simple two shard cluster with shard replicas ===
+ === Example B: Simple two shard cluster with shard replicas ===
  This example will simply build off of the previous example by creating another copy of shard1
and shard2.  Extra shard copies can be used for high availability and fault tolerance, or
simply for increasing the query capacity of the cluster.
  
  First, run through the previous example so we already have two shards and some documents
indexed into each.  Then simply make a copy of those two servers:
@@ -100, +100 @@

  
  http://localhost:7500/solr/collection1/select?distrib=true&q=*:*
  
- Send this query multiple times and observe the logs from the solr servers.  From your web
browser, you may need to hold down CTRL while clicking on the browser refresh button to bypass
the HTTP caching in your browser.  You should be able to observe Solr load balancing the requests
across shard replicas, using different servers to satisfy each request.  There will be a log
statement for the top-level request in the server the browser sends the request to, and then
a log statement for each sub-request that are merged to produce the complete response. 
+ Send this query multiple times and observe the logs from the solr servers.  From your web
browser, you may need to hold down CTRL while clicking on the browser refresh button to bypass
the HTTP caching in your browser.  You should be able to observe Solr load balancing the requests
across shard replicas, using different servers to satisfy each request.  There will be a log
statement for the top-level request in the server the browser sends the request to, and then
a log statement for each sub-request that are merged to produce the complete response.
+ 
+ To demonstrate fail over for high availability, go ahead and kill any one of the Solr servers
(just press CTRL-C in the window running the server) and and send another query request to
any of the remaining servers that are up.
+ 
+ === Two shard cluster with shard replicas and zookeeper ensemble ===
+ The problem with example B is that while there are enough Solr servers to survive any one
of them crashing, there is only one zookeeper server that contains the state of the cluster.
 If that zookeeper server crashes, distributed queries will still work since the solr servers
remember the state of the cluster last reported by zookeeper.  The problem is that no new
servers or clients will be able to discover the cluster state, and no changes to the cluster
state will be possible.
+ 
+ Running multiple zookeeper servers in concert (a zookeeper ensemble) allows for high availability
of the zookeeper service.  Every zookeeper server needs to know about every other zookeeper
server in the ensemble, and a majority of servers are needed to provide service.  For example,
a zookeeper ensemble of 3 servers allows any one to fail with the remaining 2 constituting
a majority to continue providing service.  5 zookeeper servers are needed to allow for the
failure of up to 2 servers at a time.
+ 
+ For production, it's recommended that you run an external zookeeper ensemble rather than
having Solr run embedded zookeeper servers.  For this example, we'll use the embedded servers
for simplicity.
+ 
+ First, stop all 4 servers and then clean up the zookeeper data directories for a fresh start.
+ {{{
+ rm -r example*/solr/zoo_data
+ }}}
+ 
+ We will be running the servers again at ports 8983,7574,8900,7500.  The default is to run
an embedded zookeeper server at hostPort+1000, so if we run an embedded zookeeper on the first
three servers, the ensemble address will be {{{localhost:9983,localhost:8574,localhost:9900}}}.
+ 
+ As a convenience, we'll have the first server upload the solr config to the cluster.  You
will notice it block until you have actually started the second server.  This is due to zookeeper
needing a quorum before it can operate.
+ 
+ NOTE: this doesn't work yet because the client of the second server checks for the collection
config before the first has finished uploading it, and the first server needs to wait until
the second server starts to establish a quorum to start uploading.
+ {{{
+ cd example
+ java -Dbootstrap_confname=myconf -Dbootstrap_confdir=./solr/conf -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900
 -jar start.jar
+ }}}
+ 
+ {{{
+ cd example2
+ java -Djetty.port=7574 -DhostPort=7574 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900
-jar start.jar
+ }}}
+ 
+ {{{
+ cd exampleB
+ java -Djetty.port=8900 -DhostPort=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900
-jar start.jar
+ }}}
+ 
+ {{{
+ cd example2B
+ java -Djetty.port=7500 -DhostPort=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900
-jar start.jar
+ }}}
  
  
  == ZooKeeper ==
- Multiple Zookeeper servers running together for fault tolerance and high availability is
called an ensemble.  For production, it's recommended that you run an external zookeeper ensemble
rather than having Solr run embedded servers.
+ Multiple Zookeeper servers running together for fault tolerance and high availability is
called an ensemble.  For production, it's recommended that you run an external zookeeper ensemble
rather than having Solr run embedded servers.  See the [[http://hadoop.apache.org/zookeeper/|Apache
ZooKeeper]] site for more information on downloading and running a zookeeper ensemble.
  
  When Solr runs an embedded zookeeper server, it defaults to using the solr port plus 1000
for the zookeeper client port.  In addition, it defaults to adding one to the client port
for the zookeeper server port, and two for the zookeeper leader election port.  So in the
first example with Solr running at 8983, the embedded zookeeper server used port 9983 for
the client port and 9984,9985 for the server ports.
  

Mime
View raw message