kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Question about redistributing tablets on failure of a tserver.
Date Thu, 13 Apr 2017 05:29:31 GMT
On Wed, Apr 12, 2017 at 9:45 PM, Jason Heo <jason.heo.sde@gmail.com> wrote:

> Hi Dan.
> I'm very happy to hear from you. Kudu is REALLY GREAT!
Thanks for the excitement! It's always great to hear when people are happy
with the project.

> About Q2:
> There are 14 tservers on my test cluster, each node has 3TB before
> re-replication and evenly distributed. Network bandwidth is 1Gbps.
> I have another question.
> Is it possible to re-replication cancel if failed tserver joins while
> re-replication goes on? re-joined tserver already has all data so I think
> re-replication is unnecessary and re-replication is a waste of time and
> resources. (This is what Elasticsearch behaves)

Yes, that's definitely something we'd like to do in the near future.

Right now our design is that when the leader notices a bad replica, it
ejects it from the Raft configuration, so we have a 2-node configuration.
We then immediately add a new replica and start making a tablet copy to it,
which may take some time with large tablets. During that time, if the old
node comes back, it is no longer part of the configuration and can't rejoin.

Mike Percy has started looking into changing the design to do something
more like:

- Original 3 nodes: A, B, C = VOTER
- node C dies
- add node D as a NON_VOTER/PRE_VOTER, and start the tablet copy
- if node C comes back up, remove D and cancel the tablet copy
- if node C is still not up when 'D' is available, evict C and convert D to

Implementation isn't begun yet, but hopefully we can get this done in the
next couple of months (eg 1.4 or 1.5 release time line)


> 2017-04-13 3:47 GMT+09:00 Dan Burkert <danburkert@apache.org>:
>> Hi Jason, answers inline:
>> On Wed, Apr 12, 2017 at 5:53 AM, Jason Heo <jason.heo.sde@gmail.com>
>> wrote:
>>> Q1. Can I disable redistributing tablets on failure of a tserver? The
>>> reason why I need this is described in Background.
>> We don't have any kind of built-in maintenance mode that would prevent
>> this, but it can be achieved by setting a flag on each of the tablet
>> servers.  The goal is not to disable re-replicating tablets, but instead to
>> avoid kicking the failed replica out of the tablet groups to begin with.
>> There is a config flag to control exactly that: 'evict_failed_followers'.
>> This isn't considered a stable or supported flag, but it should have the
>> effect you are looking for, if you set it to false on each of the tablet
>> servers, by running:
>>     kudu tserver set-flag <tserver-addr> evict_failed_followers false
>> --force
>> for each tablet server.  When you are done, set it back to the default
>> 'true' value.  This isn't something we routinely test (especially setting
>> it without restarting the server), so please test before trying this on a
>> production cluster.
>> Q2. redistribution goes on even if the failed tserver reconnected to
>>> cluster. In my test cluster, it took 2 hours to distribute when a tserver
>>> which has 3TB data was killed.
>> This seems slow.  What's the speed of your network?  How many nodes?  How
>> many tablet replicas were on the failed tserver, and were the replica sizes
>> evenly balanced?  Next time this happens, you might try monitoring with
>> 'kudu ksck' to ensure there aren't additional problems in the cluster (admin guide
>> on the ksck tool
>> <https://github.com/apache/kudu/blob/master/docs/administration.adoc#ksck>
>> ).
>>> Q3. `--follower_unavailable_considered_failed_sec` can be changed
>>> without restarting cluster?
>> The flag can be changed, but it comes with the same caveats as above:
>>     'kudu tserver set-flag <tserver-addr> follower_unavailable_considered_failed_sec
>> 900 --force'
>> - Dan

Todd Lipcon
Software Engineer, Cloudera

View raw message