lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "ZooKeeperIntegration" by GrantIngersoll
Date Wed, 15 Jul 2009 15:58:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by GrantIngersoll:
http://wiki.apache.org/solr/ZooKeeperIntegration

------------------------------------------------------------------------------
  = Introduction =
+ 
+ Integrating Solr and !ZooKeeper allow us a lot more flexibility for dynamic, distributed
configuration.  Additionally, it does not require a breakage of back-compatibility and it
can use the existing Solr infrastructure.
  
  See https://issues.apache.org/jira/browse/SOLR-1277
  
+ See http://hadoop.apache.org/zookeeper
+ 
  = Architecture =
  
- Describe how we can use existing Solr components w/ ZooKeeper for distributed search, replication
and management.
+ == Distributed Search ==
  
+ For distributed search, create a new !ShardsComponent that moves the shard calculation code
from !QueryComponent and handles both the current approach and the !ZooKeeper approach.
+ 
+ On startup, !ZooKeeper configuration contains whether the node is a shard or not.  If it
is, it registers itself with !ZooKeeper by adding a value under the appropriate path in !ZooKeeper
(this is configurable).  
+ 
+ For example, if you a shard could register itself in the solr shard group "solr_shards"
as:
+ solr_shards/192.168.0.1_8080_solr [192.168.0.1:8080/solr]  // Note, the [] contain the actual
address that is used when constructing the rb.shards value
+ 
+ Thus, solr_shards with two nodes might look like:
+ solr_shards/
+   192.168.0.1_8080_solr [192.168.0.1:8080/solr]
+   192.168.0.2_8080_solr [192.168.0.2:8080/solr]
+ 
+ Shards are ephemeral nodes in ZK speak and thus go away if the node dies.
+ 
+ Then, when a query comes in, the !ShardsComponent can build the !ResponseBuilder.shards
value appropriately based on what's contained in the shard group that it is participating
in.  This shard group approach should allow for a fanout approach to be employed.
+ 
+ == Master/Slave ==
+ 
+ NOTE COMPLETELY IMPLEMENTED YET.
+ 
+ Nodes can register themselves as Masters by adding their entry to a Master Group and marking
themselves as a master.  The !ReplicationHandler can then be configured to subscribe to that
Master Group, getting the first one out of the list of children of the group (this is dependent
on !ZooKeeper supporting getFirstChild() which it currently does not) Masters are ephemeral.
 If that is not implemented, then we need some other way of selecting the master.  For now,
it could just be configured so that there is only one master.
+ 
+ Thus, if there are two groups of Masters, then it would look like this:
+ master_group_1/
+    192.168.0.1_8080_solr [192.168.0.1:8080/solr]
+    192.168.0.2_8080_solr [192.168.0.2:8080/solr]
+ master_group_2/
+    192.168.0.3_8080_solr [192.168.0.3:8080/solr]
+    192.168.0.4_8080_solr [192.168.0.4:8080/solr]
+ 
+ The trick here is how to keep all the masters in a group in sync. 
+ Ideas:
+ 1. Servlet filter that replicates out indexing commands to other masters in a master group
+ 2. backup masters replicate from the master
+ 3. Others?, as neither of these is 100% fault tolerant
+ 
+ === Rebalancing ===
+ 
+ Through the ZK req handler, slaves can be moved around, at which point they will pull the
index from the master in their group and thus you can have rebalancing.  Additionally, new
nodes that come online w/o an index will go to their master and get the index.  The replication
handler already handles replicating configuration files, so this is just a config issue.
+ 
+ = Implementation =
+ 
+ The current patch implements this all by adding a !ZooKeeper onto the !SolrCore and configuring
it via the solrconfig.xml.  The current patch only supports distributed search, but has some
of the plumbing for setting up the master group stuff.  The !ReplicationHandler has not been
implemented yet.
+ 
+ = Configuration and Running =
+ 
+ !ZooKeeper config in solrconfig.xml looks like:
+ {{{
+ <zooKeeper>
+     <!-- See the ZooKeeper docs -->
+     <hostPorts>localhost:2181</hostPorts>
+     <!-- TODO: figure out how to do this programmatically -->
+     <me>localhost:8983/solr</me>
+     <!-- Timeout for the ZooKeeper.  Optional.  Default 10000 -->
+     <!-- Timeout in ms -->
+     <timeout>5000</timeout>
+     <shardsNodeName>/solr_shards</shardsNodeName>
+     <mastersNodeName>/solr_masters</mastersNodeName>
+ 
+   </zooKeeper>
+ }}}
+ 
+ Configure the !ZooKeeperRequestHandler:
+ {{{
+ <requestHandler name="/zoo" class="solr.ZooKeeperRequestHandler">
+     <bool name="shard">true</bool>
+     <str name="master">master_group_1</str>
+   </requestHandler>
+ }}}
+ 
+ The !ShardsComponent is automatically setup.
+ 
+  1. Setup !ZooKeeper according to !ZooKeeper docs, including a ZK config file.  
+  1. Startup the !ZooKeeper server with your configuration file.  
+  1. Startup your Solr nodes, all properly configured
+ 
+ TODO: Show real example.
+ 

Mime
View raw message