lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Trivial Update of "SolrCloud" by YonikSeeley
Date Mon, 01 Feb 2010 23:17:56 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCloud" page has been changed by YonikSeeley.
The comment on this change is: start simple replicated example.
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=21&rev2=22

--------------------------------------------------

  Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination
- think of it as a distributed filesystem.
  
  Since we'll need two solr servers for this example, simply make a copy of the example directory
for the second server.
+ 
  {{{
  cp -r example example2
  }}}
- 
  === Simple two shard cluster ===
+ This example simply creates a cluster consisting of two solr servers representing two different
shards of a collection.
+ 
+ Since we'll need two solr servers for this example, simply make a copy of the example directory
for the second server.
+ 
+ {{{
+ cp -r example example2
+ }}}
  This command starts up a Solr server and bootstraps a new solr cluster.
+ 
  {{{
  cd example
  java -Dbootstrap_confname=myconf -Dbootstrap_confdir=./solr/conf -DzkRun -jar start.jar
  }}}
- 
   * {{{-DzkRun}}} tells solr to run a single standalone zookeeper server as part of this
Solr server.
   * {{{-Dbootstrap_confname=myconf}}} tells this solr node to use the "myconf" configuration
stored within zookeeper.
   * {{{-Dbootstrap_confdir=./solr/conf}}} since "myconf" does not actually exist yet, this
parameter causes the local configuration directory {{{./solr/conf}}} to be uploaded to zookeeper
as the "myconf" config.
@@ -37, +44 @@

  
  You can see from the zookeeper browser that the Solr configuration files were uploaded under
"myconf", and that a new document collection called "collection1" was created.  Under collection1
is a list of shards, the pieces that make up the complete collection.
  
- Now we want to start up our second server, assigning it a different shard, or piece of the
collection.
- Simply change the shardId parameter for the appropriate solr core in solr.xml:
+ Now we want to start up our second server, assigning it a different shard, or piece of the
collection. Simply change the shardId parameter for the appropriate solr core in solr.xml:
+ 
  {{{
  cd example2
  perl -pi -e 's/shard1/shard2/g' solr/solr.xml
  #note: if you don't have perl installed, you can simply hand edit solr.xml, changing shard1
to shard2
  }}}
- 
  Then start the second server, pointing it at the cluster:
+ 
  {{{
  java -Djetty.port=7574 -DhostPort=7574 -DzkHost=localhost:9983 -jar start.jar
  }}}
- 
   * {{{-Djetty.port=7574}}}  is just one way to tell the Jetty servlet container to use a
different port.
   * {{{-DhostPort=7574}}} tells Solr what port the servlet container is running on.
   * {{{-DzkHost=localhost:9983}}} points to the Zookeeper ensemble containing the cluster
state.  In this example we're running a single Zookeeper server embedded in the first Solr
server.  By default, an embedded Zookeeper server runs at the Solr port plus 1000, so 9983.
@@ -57, +63 @@

  If you refresh the zookeeper browser, you should now see both shard1 and shard2 in collection1.
  
  Next, index some documents to each server:
+ 
  {{{
  cd exampledocs
  java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar ipod_video.xml
  java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar monitor.xml
  }}}
- 
  And now, a request to either server with "distrib=true" results in a distributed search
that covers the entire collection:
  
  http://localhost:8983/solr/collection1/select?distrib=true&q=*:*
  
  If at any point you wish to start over fresh or experiment with different configurations,
you can delete all of the cloud state contained within zookeeper by simply deleting the solr/zoo_data
directory after shutting down the servers.
+ 
+ === Simple two shard cluster with shard replicas ===
+ This example will simply build off of the previous example by creating another copy of shard1
and shard2.  Extra shard copies can be used for high availability and fault tolerance, or
simply for increasing the query capacity of the cluster.
+ 
+ First, run through the previous example so we already have two shards and some documents
indexed into each.  Then simply make a copy of those two servers:
+ 
+ {{{
+ cp -r example exampleB
+ cp -r example2 example2B
+ }}}
+ Then start the two new servers on different ports, each in its own window:
+ 
+ {{{
+ cd exampleB
+ java -Djetty.port=8900 -DhostPort=8900 -DzkHost=localhost:9983 -jar start.jar
+ }}}
+ {{{
+ cd example2B
+ java -Djetty.port=7500 -DhostPort=7500 -DzkHost=localhost:9983 -jar start.jar
+ }}}
+ Refresh the zookeeper browser page http://localhost:8983/solr/admin/zookeeper.jsp and verify
that 4 solr nodes are up, and that each shard is present at 2 nodes.
+ 
+ Now send a query to any of the servers to query the cluster:
+ 
+ http://localhost:7500/solr/collection1/select?distrib=true&q=*:*
+ 
+ Send this query multiple times and observe the logs from the solr servers.  From your web
browser, you may need to hold down CTRL while clicking on the browser refresh button to bypass
the HTTP caching in your browser.  You should be able to observe Solr load balancing the requests
across shard replicas, using different servers to satisfy each request.  There will be a log
statement for the top-level request in the server the browser sends the request to, and then
a log statement for each sub-request that are merged to produce the complete response. 
+ 
  
  == ZooKeeper ==
  Multiple Zookeeper servers running together for fault tolerance and high availability is
called an ensemble.  For production, it's recommended that you run an external zookeeper ensemble
rather than having Solr run embedded servers.

Mime
View raw message