accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4353) Stabilize tablet assignment during transient failure
Date Thu, 23 Jun 2016 22:38:16 GMT


Josh Elser commented on ACCUMULO-4353:

bq. I don't see any harm done here as long as the default behavior is what happens today.
Allowing an administrator to choose to delay tablet reassignment may not fit most use cases,
but it could fit some.

You're right that there isn't any harm to existing users, but there's always a concern of
technical debt (in terms of complexity) and architectural goals. I'm more worried about where
we're going that this change causing harm to an existing user.

If this is really about trying to make rolling-restarts better, I'd encourage a look at ACCUMULO-1454.
[~kturner] and I (and others) had a very lengthy discussion on the subject, but never sat
down to work on an implementation.

> Stabilize tablet assignment during transient failure
> ----------------------------------------------------
>                 Key: ACCUMULO-4353
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Shawn Walker
>            Assignee: Shawn Walker
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
> When a tablet server dies, Accumulo attempts to reassign the tablets it was hosting as
quickly as possible to maintain availability.  If multiple tablet servers die in quick succession,
such as from a rolling restart of the Accumulo cluster or a network partition, this behavior
can cause a storm of reassignment and rebalancing, placing significant load on the master.
> To avert such load, Accumulo should be capable of maintaining a steady tablet assignment
state in the face of transient tablet server loss.  Instead of reassigning tablets as quickly
as possible, Accumulo should be await the return of a temporarily downed tablet server (for
some configurable duration) before assigning its tablets to other tablet servers.

This message was sent by Atlassian JIRA

View raw message