hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thanh Do (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1337) Unmatched file length makes append fail. Should we retry if a startBlockRecovery() fails?
Date Mon, 09 Aug 2010 21:27:16 GMT
Unmatched file length makes append fail. Should we retry if a startBlockRecovery() fails?

                 Key: HDFS-1337
                 URL: https://issues.apache.org/jira/browse/HDFS-1337
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: data-node
    Affects Versions: 0.20-append
            Reporter: Thanh Do

- Component: data node
- Version: 0.20-append
- Setup:
1) # disks / datanode = 3
2) # failures = 2
3) failure type = crash
4) When/where failure happens = (see below)
- Details:
Client writes to dn1-dn2-dn3. Write succeeds. We have blk_X_1001 in all dns.
Now client tries to append. It first calls dn1.recoverBlock().
This recoverBlock succeeds.  We have blk_X_1002 in all dns.
Suppose the pipeline is dn3-dn2-dn1. Client sends packet to dn3.
dn3 forwards the packet to dn2 and writes to its disk (i.e dn3's disk).
Now, *dn2 crashes*, so that dn1 has not received this packet yet.
Client calls dn1.recoverBlock() again, this time with dn3-dn1 in the pipeline.
dn1 then calls dn3.startBlockRecovery() which terminates writer thread in dn3,
get the *in memory* metadata info (i.e 512 byte length), and verifies that info with
the real file on disk (i.e 1024 byte length), hence the Exception.
(in this case, the block at dn3 is not finalized yet, and the FSDataset.setVisibleLength
has not been called, hence its visible in-memory length
is 512 byte, although its on-disk length is 1024.)
Therefore, from dn1's view, dn3 has some problem.
Now dn1 calls its own startBlockRecovery() successfully (because the on-disk
file length and memory file length match, both are 512 byte).
     + at dn1: blk_X_1003 (length 512)
     + at dn2: blk_X_1002 (length 512)     
     + at dn3: blk_X_1002 (length 1024)
dn1 also calls NN.commitSync (blk_X_1003, [dn1]), i.e only dn1 has a good replica.
After all:
- From NN point of view: dn1 is candidate for leaseRecovery
- From the client's view, dn1 is the only healthy node in the pipeline.
(it knows that by the result returned from recoverBlock).
Client starts sending a packet to dn1, now *dn1 crashes*, hence append fails.

Why? after all, dn1 and dn2 crashes. Only dn3 contains the block with GS 1002.
But NN sees blk_X_1003, because dn1 has successfully called commitBlockSync(blk_X_1003).
Hence, when reader asks to read the file, NN gives blk_X_1003,
and no alive dn contains that block with GS 1003.
- RE-APPEND with different client: FAIL
     + The file is under construction, and its holder is A1.
- NN.leaseRecovery(): FAIL
     + no alive target (i.e dn1, not dn3)
     + hence, as long as dn1 is not alive and the lease is not recovered, the file is unable
to be appended
     + worse, even dn3 sends blockReport to NN and becomes target for lease recovery, 
     Lease recovery fails because:
          1) dn3 has block blk_X_1002 which has smaller GS
               than the block NN asks for,
          2) dn3 cannot contact dn1 which crashed

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message