hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1054) Make sleep after failure in nextBlockOutputStream smarter and configurable
Date Sun, 21 Mar 2010 08:13:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847897#action_12847897

Todd Lipcon commented on HDFS-1054:

FYI the situation where I'm running into this is an hbase edits stress test where I'm killing
a DN that's local to one of the region servers. That region server immediately starts acting
up because all file creates take an extra 6 seconds (it still thinks the local DN is up until
the NN marks it down). In this case it's getting an immediate "Connection Refused" from the
local DN anyway, and the second attempt always works fine since it makes it into the HDFS-630
excludedNodes list

> Make sleep after failure in nextBlockOutputStream smarter and configurable
> --------------------------------------------------------------------------
>                 Key: HDFS-1054
>                 URL: https://issues.apache.org/jira/browse/HDFS-1054
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
> If DFSOutputStream fails to create a pipeline, it currently sleeps 6 seconds before retrying.
I don't see a great reason to wait at all, much less 6 seconds (especially now that HDFS-630
ensures that a retry won't go back to the bad node). We should at least make it configurable,
and perhaps something like backoff makes some sense.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message