lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Solr Cloud wiki and branch notes
Date Sat, 16 Jan 2010 00:36:37 GMT

My 0.02 PLN on the subject ...

First the terminology: reading your emails I have a feeling that my head 
is about to explode. We have to agree on the vocabulary, otherwise we 
have no hope of reaching any consensus. I propose the following 
vocabulary that has been in use and is generally understood:

* (global) search index: a complete collection of all indexed documents. 
 From a conceptual point of view, this is our complete search space.

* index shard: a non-overlapping part of the search index. All shards in 
the system form together the complete search space of the search index. 
E.g. having initially one big index I could divide it into multiple 
shards using MultiPassIndexSplitter, and if I combined all the shards 
again, using IndexMerger, I should obtain the original complete search 
index (modulo changed Lucene docids .. doesn't matter). I strongly 
believe in micro-sharding, because they are much easier to handle and 
replicate. Also, since we control the shards we don't have to deal with 
overlapping shards, which is the curse of P2P search.

* partitioning: a method whereby we can determine the target shard ID 
based on a doc ID.

* search node: an application that provides search and update to one or 
more shards.

* search host: a machine that may run 1 or more search nodes.

* Shard Manager: a component that keeps track of allocation of shards to 
nodes (plus more, see below).

Now, to translate this into Solr-speak: depending on the details of the 
design, and the evolution of Solr, one search node could be one Solr 
instance that manages one shard per core. Let's forget here about the 
current distributed search component, and the current replication - they 
could be useful in this design as a raw transport mechanism, but someone 
else would be calling the shots (see below).

The replication and load balancing is a problem with many existing 
solutions, and this one in particular reminds me strongly of the Hadoop 
HDFS. In fact, early on during the development of Hadoop [1] I wondered 
whether we could reuse HDFS to manage Lucene indexes instead of opaque 
blocks of fixed size. It turned out to be infeasible, but the model of 
Namenode/Datanode still looks useful in our case, too.

I believe there are many useful lessons lurking in 
Hadoop/HBase/Zookeeper that we could reuse in our design. The following 
is just a straightforward port of the Namenode/Datanode concept.

Let's imagine a component called ShardManager that is responsible for 
managing the following data:

* list of shard ID-s that together form the complete search index,
* for each shard ID, list of search nodes that serve this shard.
* issuing replication requests
* maintaining the partitioning function (see below), so that updates are 
directed to correct shards
* maintaining heartbeat to check for dead nodes
* providing search clients with a list of nodes to query in order to 
obtain all results from the search index.

Whenever a new search node comes up, it reports its local shard ID-s 
(versioned) to the ShardManager. Based on these reports from the 
currently active nodes, the ShardManager builds this mapping of shards 
to nodes, and requests replication if some shards are too old, or if the 
replication count is too low, allocating these shards to selected nodes 
(based on a policy of some kind).

I believe most of the above functionality could be facilitated by 
Zookeeper, including the election of the node that runs the ShardManager.

We need a partitioning schema that splits documents more or less evenly 
among shards, and at the same time allows us to split or merge 
unbalanced shards. The simplest function that we could imagine is the 

	hash(docId) % numShards

though this has the disadvantage that any larger update will affect 
multiple shards, thus creating an avalanche of replication requests ... 
so a sequential model would be probably better, where ranges of docIds 
are assigned to shards.

Now, if any particular shard is too unbalanced, e.g. too large, it could 
be further split in two halves, and the ShardManager would have to 
record this exception. This is a very similar process to a region split 
in HBase, or a page split in btree DBs. Conversely, shards that are too 
small could be joined. This is the icing on the cake, so we can leave it 
for later.

After commit, a node contacts the ShardManager to report a new version 
of the shard. ShardManager issues replication requests to other nodes 
that hold a replica of this shard.

There should be a component sometimes referred to as query integrator 
(or search front-end) that is the entry and exit point for user search 
requests. On receiving a search request this component gets a list of 
randomly selected nodes from SearchManager to contact (the list 
containing all shards that form the global index), sends the query and 
integrates partial results (under a configurable policy for 
timeouts/early termination), and sends back the assembled results to the 

Again, somewhere in the background the knowledge of who to contact 
should be handled by Zookeeper.

That's it for now from the top of my head ...



Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

View raw message