accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Vines (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2976) blacklist problematic tservers
Date Thu, 03 Jul 2014 19:30:34 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051842#comment-14051842
] 

John Vines commented on ACCUMULO-2976:
--------------------------------------

Wouldn't this make more sense as part of the watchdogs people implement? Perhaps if it gets
killed x times in y period, don't restart it.

> blacklist problematic tservers
> ------------------------------
>
>                 Key: ACCUMULO-2976
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2976
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: master
>            Reporter: Sean Busbey
>            Priority: Minor
>
> It would be nice if the master kept track of tservers that misbehave and eventually blacklisted
them, similar to how HDFS handles datanodes and MapReduce/YARN handle trackers.
> Right now the closest we do is having the Master killing the zoolock for tservers that
are behaving poorly. This causes them to exit if they're not in a zombie state.
> On deployments with a watchdog that relaunches failed processes, this doesn't help much
because the tserver comes back. In the case of i.e. flakey network failures for the node this
just means repeating the process and impacting cluster performance while the master works
out that it should kill the node again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message