incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <>
Subject Re: One Query!
Date Wed, 11 Dec 2013 16:18:19 GMT
On Wed, Dec 11, 2013 at 8:53 AM, Gagan Juneja <>wrote:

> Hi Team,
> I just wanted to know how shards are created on shard servers, how the
> server is selected for a query and how replication comes into picture
> once any shard server goes down. It would be better if you can take
> simple analogy of any example table and describe this.

This is a great question and we should put the answer(s) into the

I will walk through a basic setup of creating a table, adding some data,
then querying that data.


When a client issues a create table command to a controller the following

Controller: Receives the message
Controller: Creates the ZooKeeper node that describes the table.  Typically
in /blur/clusters/default/tables/<tablename>
Controller: Controller then begins polling the shard servers using the
shardServerLayoutState waiting on the shards to be OPEN

While that is happening on the controller side the shard servers react

Shard: Sees the table get created through a ZooKeeper watcher.  And adds
the table to the online table list.
Shard: Then it calculates the layout that the shard server should be
serving by using the MasterBasedDistributedLayoutFactory class.
Shard: Then once every 10 seconds the warmup thread fires and opens any
shards that should be open on the given shard server.
  NOTE: If the shard tries to access the table the before the warmup fires,
the missing shards are opened then.
Shard: Once the shards are open the state for shard will change to OPEN and
the controller will no longer block and it will return to the client.


Client submits a mutation command to the controller.
The controller reads the rowid from the mutation and figures out (through
hashing the rowid) what shard and the server that it's being served from
and forwards the mutation to that shard server.
The shard server receives the mutation and figures out what shard it is
meant for and applies the mutation to the given index.


Client submits a query request to the controller.
The controller then sprays all the query to the shard servers.
The shard servers then issues the query to all the shards within the server
(each one in it's own thread) and the top N number of hits from each shard
are returned to the request thread.
Then the top N number of hits are returned the controller.
Then the controller receives responses from all the shards servers and
figures out the top N hits from all the servers.
The data from the top N hits are then fetch in parallel from the controller
back to the shard servers for the top N hits.
After the data is received back to the controller the response is then sent
back to the client.

Hope this helps to explain at a high level wants going on.  Let me know if
you have anymore questions.


> Regards,
> Gagan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message