incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: One Query!
Date Thu, 12 Dec 2013 12:15:16 GMT
On Thu, Dec 12, 2013 at 2:44 AM, Gagan Juneja <gagandeepjuneja@gmail.com>wrote:

> Hi Aaron,
> Thanks! this address most of my queries. I have some more queries
>
> 1. How we handle replication of shards like if some shard server is
> down the how we can address requests for shards this Shard server was
> holding.
>

Basically each server is given a portion of the shards to serve.  If that
server goes down, the layout of the table is re-calculated and of the
remaining servers each get some shard(s) to serve.  The logic behind the
layout calculation is in the
"org.apache.blur.manager.indexserver.MasterBasedDistributedLayoutFactory"
class.  ZooKeeper is used by the shard servers to know when a server goes
offline.


> 2. Do we use HDFS replication any where? Means do we use HDFS
> replicated indexes any where?
>

All indexes are stored in HDFS (for a normal install) and HDFS
transparently replicates all the block of a given file 3 times (by
default).  So we make use of it, but it's a normal function in HDFS.

Aaron


>
> Regards,
> Gagan
>
> On Wed, Dec 11, 2013 at 9:48 PM, Aaron McCurry <amccurry@gmail.com> wrote:
> > On Wed, Dec 11, 2013 at 8:53 AM, Gagan Juneja <gagandeepjuneja@gmail.com
> >wrote:
> >
> >> Hi Team,
> >> I just wanted to know how shards are created on shard servers, how the
> >> server is selected for a query and how replication comes into picture
> >> once any shard server goes down. It would be better if you can take
> >> simple analogy of any example table and describe this.
> >>
> >
> > This is a great question and we should put the answer(s) into the
> > documentation.
> >
> > I will walk through a basic setup of creating a table, adding some data,
> > then querying that data.
> >
> > CREATE TABLE:
> >
> > When a client issues a create table command to a controller the following
> > happens.
> >
> > Controller: Receives the message
> > Controller: Creates the ZooKeeper node that describes the table.
>  Typically
> > in /blur/clusters/default/tables/<tablename>
> > Controller: Controller then begins polling the shard servers using the
> > shardServerLayoutState waiting on the shards to be OPEN
> >
> > While that is happening on the controller side the shard servers react
> > accordingly.
> >
> > Shard: Sees the table get created through a ZooKeeper watcher.  And adds
> > the table to the online table list.
> > Shard: Then it calculates the layout that the shard server should be
> > serving by using the MasterBasedDistributedLayoutFactory class.
> > Shard: Then once every 10 seconds the warmup thread fires and opens any
> > shards that should be open on the given shard server.
> >   NOTE: If the shard tries to access the table the before the warmup
> fires,
> > the missing shards are opened then.
> > Shard: Once the shards are open the state for shard will change to OPEN
> and
> > the controller will no longer block and it will return to the client.
> >
> > MUTATE:
> >
> > Client submits a mutation command to the controller.
> > The controller reads the rowid from the mutation and figures out (through
> > hashing the rowid) what shard and the server that it's being served from
> > and forwards the mutation to that shard server.
> > The shard server receives the mutation and figures out what shard it is
> > meant for and applies the mutation to the given index.
> >
> > QUERY:
> >
> > Client submits a query request to the controller.
> > The controller then sprays all the query to the shard servers.
> > The shard servers then issues the query to all the shards within the
> server
> > (each one in it's own thread) and the top N number of hits from each
> shard
> > are returned to the request thread.
> > Then the top N number of hits are returned the controller.
> > Then the controller receives responses from all the shards servers and
> > figures out the top N hits from all the servers.
> > The data from the top N hits are then fetch in parallel from the
> controller
> > back to the shard servers for the top N hits.
> > After the data is received back to the controller the response is then
> sent
> > back to the client.
> >
> > Hope this helps to explain at a high level wants going on.  Let me know
> if
> > you have anymore questions.
> >
> > Aaron
> >
> >
> >
> >
> >>
> >>
> >> Regards,
> >> Gagan
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message