accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3937) Hard-coded HDFS failure tolerance
Date Sat, 11 Jul 2015 00:38:04 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623132#comment-14623132
] 

Josh Elser commented on ACCUMULO-3937:
--------------------------------------

I also increased this from 5 to 15 because we only sleep for 100ms between attempts. We should
likely have an increasing backoff during failures to trigger this more reliably.

> Hard-coded HDFS failure tolerance
> ---------------------------------
>
>                 Key: ACCUMULO-3937
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3937
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.7.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 1.7.1, 1.8.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> ACCUMULO-2480 added an error cache to the TabletServer which makes the tserver kill itself
after 5 errors creating a new WAL file within 10 seconds.
> This is painful because it now causes Accumulo to kill itself if HDFS is restarted beneath
Accumulo. Previously, I would have expected Accumulo to just keep on chugging if HDFS goes
away. Now, I'll have to restart it when HDFS returns.
> This should be a configuration property instead of being hard-coded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message