hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-2378) recoverBlock timeout in DFSClient should be longer
Date Wed, 28 Sep 2011 04:15:45 GMT
recoverBlock timeout in DFSClient should be longer
--------------------------------------------------

                 Key: HDFS-2378
                 URL: https://issues.apache.org/jira/browse/HDFS-2378
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs client
    Affects Versions: 0.20.206.0, 0.23.0
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
            Priority: Critical
             Fix For: 0.20.206.0, 0.23.0


In a failure scenario when one of the datanodes in a pipeline has "frozen" (eg hard swapping
or disk controller issues) we sometimes see timeouts in the call to recoverBlock(). This is
because recoverBlock's implementation sends several RPCs internally (to the NN and to other
nodes in the pipeline) with the same timeout. Since the timeouts are equal, the "outer" call
times out first. The retry then fails since recovery is already in progress, or already finished.

The best fix would be to make recoverBlock idempotent so the retry doesn't fail, but in the
absence of that we can likely fix this issue by increasing the timeout to be equal to the
sum of the timeouts of the underlying recovery calls.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message