accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William Slacum (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3963) Incremental backoff on inability to write to HDFS
Date Fri, 14 Aug 2015 21:27:45 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697784#comment-14697784
] 

William Slacum commented on ACCUMULO-3963:
------------------------------------------

Is ACCUMULO-2480 generally about killing tservers in light of HDFS failure, or if tservers
were hanging when they were trying to be killed? There's a subtle, but important, difference
there. What it seem you're talking about is the ability for resiliency in the face of some
failure, whereas the other is about explicitly stopping service but being unable to.

> Incremental backoff on inability to write to HDFS
> -------------------------------------------------
>
>                 Key: ACCUMULO-3963
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3963
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.7.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Critical
>             Fix For: 1.7.1, 1.8.0
>
>
> ACCUMULO-2480 added some support to kill the tserver if HDFS is unavailable after a number
of checks. ACCUMULO-3937 added some configuration values to loosen this.
> We still only sleep for a static 100ms after every failure. This makes the default 15
attempts over 10 seconds a bit misleading as it will kill itself after 1.5 seconds not 10.
> I'm thinking that this should really be more like a 30-60s wait period out of the box.
Anything less isn't really going to insulate operators from transient HDFS failures (due to
services being restarted or network partitions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message