accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Created] (ACCUMULO-3963) Incremental backoff on inability to write to HDFS
Date Fri, 14 Aug 2015 20:26:45 GMT
Josh Elser created ACCUMULO-3963:

             Summary: Incremental backoff on inability to write to HDFS
                 Key: ACCUMULO-3963
             Project: Accumulo
          Issue Type: Bug
          Components: tserver
    Affects Versions: 1.7.0
            Reporter: Josh Elser
            Assignee: Josh Elser
            Priority: Critical
             Fix For: 1.7.1, 1.8.0

ACCUMULO-2480 added some support to kill the tserver if HDFS is unavailable after a number
of checks. ACCUMULO-3937 added some configuration values to loosen this.

We still only sleep for a static 100ms after every failure. This makes the default 15 attempts
over 10 seconds a bit misleading as it will kill itself after 1.5 seconds not 10.

I'm thinking that this should really be more like a 30-60s wait period out of the box. Anything
less isn't really going to insulate operators from transient HDFS failures (due to services
being restarted or network partitions).

This message was sent by Atlassian JIRA

View raw message