accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4353) Stabilize tablet assignment during transient failure
Date Fri, 24 Jun 2016 16:25:16 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348520#comment-15348520
] 

Josh Elser commented on ACCUMULO-4353:
--------------------------------------

bq. Ahh, my mistake then. As a new contributor to Accumulo, I still don't have a full grasp
of the rules, either written or unwritten. My feeling from watching the list was that primary
modus operandi was to present a (fully implemented) solution along with a proposed problem,
and then to discuss the merits of the solution.

In general at Apache, it's best to discuss a large/intrusive change to the codebase before
you spend time writing code (in case there is consensus on an approach different than what
you thought best). It goes back to the community-first aspect. It's fine that you provided
code right away (I don't mean to be scolding), I just don't want to see you waste a week writing
some code for a consensus that a different direction than what you wrote to was agreed upon.

> Stabilize tablet assignment during transient failure
> ----------------------------------------------------
>
>                 Key: ACCUMULO-4353
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4353
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Shawn Walker
>            Assignee: Shawn Walker
>            Priority: Minor
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a tablet server dies, Accumulo attempts to reassign the tablets it was hosting as
quickly as possible to maintain availability.  If multiple tablet servers die in quick succession,
such as from a rolling restart of the Accumulo cluster or a network partition, this behavior
can cause a storm of reassignment and rebalancing, placing significant load on the master.
> To avert such load, Accumulo should be capable of maintaining a steady tablet assignment
state in the face of transient tablet server loss.  Instead of reassigning tablets as quickly
as possible, Accumulo should be await the return of a temporarily downed tablet server (for
some configurable duration) before assigning its tablets to other tablet servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message