incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <>
Subject Re: Shard takeover behavior
Date Fri, 07 Mar 2014 02:24:59 GMT
On Thu, Mar 6, 2014 at 5:04 AM, Ravikumar Govindarajan <> wrote:

> What do you think of giving an extra leeway for shard-server  failover
> cases?
> Ex: Whenever a shard-server process gets killed, the controller-node does
> not immediately update-layout, but rather mark it as a suspect.

Just to clarify what actually happens.  The controllers are readonly when
it comes to the layout.  They could remove shard servers from the cluster
but they only read the layout from ZooKeeper.  The shards themselves
calculate the layout.  When a table change or shard server change occurs
they all react and lock on the update.  The first one in, calculates a
leveled minimal movement layout and stores it in ZooKeeper.  Then the next
node in validates that the layout is still validate and loads it and the
rest continue in the same way.

The reason the controllers are readonly is because they can actually see
more than one shard server cluster at a time so it's easier to manage.
 Also if the controllers are online before the shards are all online the
controller begins calculating new layouts, which can produce a lot of extra
churn in the cluster.

> When we have a read-only back-up of shard, searches can continue
> unhindered. Indexing during this time can be diverted to a queue, which
> will store and retry-ops, when shard-server comes online again.

Yes that's what I was thinking.

> Over configured number of attempts/time, if the shard-server does not come
> up, then one controller-server can authoritatively mark it as down and
> update the layout.

Yep, I agree.

> --
> Ravi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message