hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-2639) A client may fail during block recovery even if its request to recover a block succeeds
Date Tue, 06 Dec 2011 23:14:40 GMT
A client may fail during block recovery even if its request to recover a block succeeds

                 Key: HDFS-2639
                 URL: https://issues.apache.org/jira/browse/HDFS-2639
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs client
    Affects Versions: 1.0.0
            Reporter: Eli Collins

The client gets stuck in the following loop if an rpc its issued to recover a block timed

1.  processDatanodeError
2.     DN#recoverBlock
3.        DN#syncBlock
4.           NN#nextGenerationStamp
5.  sleep 1s
6.  goto 1

Once we've timed out onece at step 2 and loop, step 2 throws an IOE because the block is already
being recovered and step 4 throws an IOE because the block GS is now out of date (the previous,
timed-out, request got a new GS and updated the block). Eventually the client reaches max
retries, considers all DNs bad, and close throws an IOE.

The client should be able to succeed if one of its requests to recover the block succeeded.
It should still fail if another client (eg HBase via recoverLease or the NN via releaseLease)
succesfully recovered the block. One way to handle this would be to not timeout the request
to recover the block. Another would be able to make a subsequent call to recoverBlock succeed
eg by updating the block's sequence number to be the latest value that was updated by the
same client in the previous request (ie it can recover over itself but not another client).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message