helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Correct way to redistribute work from disconnected instances?
Date Wed, 19 Oct 2016 19:52:20 GMT
I think this might be a corner case when partitions * replicas <
TOTAL_NUMBER_OF_NODES. Can you try with many partitions and replicas and
check if the issue still exists.



On Wed, Oct 19, 2016 at 11:53 AM, Michael Craig <mcraig@box.com> wrote:

> I've noticed that partitions/replicas assigned to disconnected instances
> are not automatically redistributed to live instances. What's the correct
> way to do this?
>
> For example, given this setup with Helix 0.6.5:
> - 1 resource
> - 2 replicas
> - LeaderStandby state model
> - FULL_AUTO rebalance mode
> - 3 nodes (N1 is Leader, N2 is Standby, N3 is just sitting)
>
> Then drop N1:
> - N2 becomes LEADER
> - Nothing happens to N3
>
> Naively, I would have expected N3 to transition from Offline to Standby,
> but that doesn't happen.
>
> I can force redistribution from GenericHelixController#onLiveInstanceChange
> by
> - dropping non-live instances from the cluster
> - calling rebalance
>
> The instance dropping seems pretty unsafe! Is there a better way?
>

Mime
View raw message