lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrCloud" by YonikSeeley
Date Thu, 03 Dec 2009 22:02:29 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCloud" page has been changed by YonikSeeley.
The comment on this change is: zookeeper schema thoughts.
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=1&rev2=2

--------------------------------------------------

- === High level design goals ===
+ == High level design goals ==
  These are long term goals for SolrCloud.  Many of these features will not be developed in
the first versions, but we're designing for the long haul.
  
  ===== High Availability and Fault Tolerance =====
@@ -38, +38 @@

  
  '''shard'''
  
-  * A piece of a collection
+  * A piece of a collection.  A shard may or may not have replicas (copies), and may partially
overlap with other shards.
  
  '''core'''
  
@@ -54, +54 @@

  
   * switch, rack, data center, etc
  
- Resources
+ == Zookeeper Schema ==
+ === Model and State ===
+ There seem to logically be two different types of data that we want contained in zookeeper:
  
+ '''Model''' - represents the goal / targets of the cluster and the systems in it.
+ 
+ '''State '''- represents the actual current state of the cluster and the systems comprising
it.
+ 
+ A manager can make well-defined changes to the model, and the servers should respond such
that eventually their state matches that of the model.
+ 
+ === Multiple Solr clusters ===
+ A Solr cluster should be able to use an existing zookeeper cluster, and multiple solr clusters
should be able to coexist on a single zookeeper cluster.
+ 
+ One idea:  This seems easiest to achieve with a configuration URL that points to the zookeeper
cluster and includes any arbitrary prefix.
+ 
+ === Shard Identification ===
+ Two ways of identifying shards are needed.
+ 
+ For complex cluster features in the future, Solr will need to know where to find specific
documents.  The documents a shard contains can be defined by a range of  ids - the ids in
this case being hash codes of something else like the unique key field, or user supplied.
 See the amazon dynamo paper and other descriptions of consistent hashing.
+ 
+ In the most basic case though, we will be dealing with indexes built outside the cluster.
 In these cases, we won't know what documents are in what shards, but we still need a way
to identify the fact that one shard is simply a replica of another shard.
+ 
+ == Resources ==
  http://sourceforge.net/mailarchive/forum.php?forum_name=bailey-developers
  
+ http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
+ 

Mime
View raw message