incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gagan Juneja <>
Subject Re: One Query!
Date Thu, 12 Dec 2013 07:44:54 GMT
Hi Aaron,
Thanks! this address most of my queries. I have some more queries

1. How we handle replication of shards like if some shard server is
down the how we can address requests for shards this Shard server was
2. Do we use HDFS replication any where? Means do we use HDFS
replicated indexes any where?


On Wed, Dec 11, 2013 at 9:48 PM, Aaron McCurry <> wrote:
> On Wed, Dec 11, 2013 at 8:53 AM, Gagan Juneja <>wrote:
>> Hi Team,
>> I just wanted to know how shards are created on shard servers, how the
>> server is selected for a query and how replication comes into picture
>> once any shard server goes down. It would be better if you can take
>> simple analogy of any example table and describe this.
> This is a great question and we should put the answer(s) into the
> documentation.
> I will walk through a basic setup of creating a table, adding some data,
> then querying that data.
> When a client issues a create table command to a controller the following
> happens.
> Controller: Receives the message
> Controller: Creates the ZooKeeper node that describes the table.  Typically
> in /blur/clusters/default/tables/<tablename>
> Controller: Controller then begins polling the shard servers using the
> shardServerLayoutState waiting on the shards to be OPEN
> While that is happening on the controller side the shard servers react
> accordingly.
> Shard: Sees the table get created through a ZooKeeper watcher.  And adds
> the table to the online table list.
> Shard: Then it calculates the layout that the shard server should be
> serving by using the MasterBasedDistributedLayoutFactory class.
> Shard: Then once every 10 seconds the warmup thread fires and opens any
> shards that should be open on the given shard server.
>   NOTE: If the shard tries to access the table the before the warmup fires,
> the missing shards are opened then.
> Shard: Once the shards are open the state for shard will change to OPEN and
> the controller will no longer block and it will return to the client.
> Client submits a mutation command to the controller.
> The controller reads the rowid from the mutation and figures out (through
> hashing the rowid) what shard and the server that it's being served from
> and forwards the mutation to that shard server.
> The shard server receives the mutation and figures out what shard it is
> meant for and applies the mutation to the given index.
> Client submits a query request to the controller.
> The controller then sprays all the query to the shard servers.
> The shard servers then issues the query to all the shards within the server
> (each one in it's own thread) and the top N number of hits from each shard
> are returned to the request thread.
> Then the top N number of hits are returned the controller.
> Then the controller receives responses from all the shards servers and
> figures out the top N hits from all the servers.
> The data from the top N hits are then fetch in parallel from the controller
> back to the shard servers for the top N hits.
> After the data is received back to the controller the response is then sent
> back to the client.
> Hope this helps to explain at a high level wants going on.  Let me know if
> you have anymore questions.
> Aaron
>> Regards,
>> Gagan

View raw message