Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-dev@hadoop.apache.org
Date: Thu, 3 Apr 2014 02:40:16 +0000 (UTC)
From: "Tsz Wo Nicholas Sze (JIRA)" <jira@apache.org>
To: hdfs-dev@hadoop.apache.org
Message-ID: <JIRA.12471144.1281388176083.55565.1396492816520@arcas>
In-Reply-To: <JIRA.12471144.1281388176083@arcas>
References: <JIRA.12471144.1281388176083@arcas>
Subject: [jira] [Resolved] (HDFS-1336) TruncateBlock does not update
 in-memory information correctly
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo Nicholas Sze resolved HDFS-1336.
---------------------------------------

    Resolution: Not a Problem

I guess that this is not a problem anymore. Please feel free to reopen this if I am wrong. Resolving ...

> TruncateBlock does not update in-memory information correctly
> -------------------------------------------------------------
>
>                 Key: HDFS-1336
>                 URL: https://issues.apache.org/jira/browse/HDFS-1336
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 0.20-append
>            Reporter: Thanh Do
>
> - Component: data node
>  
> - Version: 0.20-append
>  
> - Summary: we found a case that when a block is truncated during updateBlock,
> the length on the ongoingCreates is not updated, hence leading to failed append.
>  
> - Setup:
> # disks / datanode = 3
> # failures = 2
> failure type = crash
> When/where failure happens = (see below)
>  
> - Details:
> 1) Client writes to dn1-dn2-dn3. Write successes.
> 2) Now client tried to append. It first call dn1.recoverBlock().This recoverBlock succeeds.
> 3) Suppose the pipeline is dn3-dn2-dn1. Client sends packet to dn3. 
> dn3 forwards the packet to dn2 and writes to its disk (i.e dn3's disk).
> Now, *dn2 crashes*, so that dn1 has not received this packet yet.
> 4) Client calls dn1.recoverBlock() again, this time with dn3-dn1 in the pipeline.
> dn1 then calls dn3.startBlockRecovery() to terminate the writer thread in dn3.
> get the *in memory* metadata info of the block, and verify that info with
> the real file on disk.
> dn3 maintains an in-memory data structure call *ongoingCreates* to record
> information about currently-being-created block. If a block is finalized, then
> its info is removed from *ongoingCreates*.
>  
> Now suppose that at the time dn3 receives startBlockRecovery() request from dn1, 
> it has:
>      + finished writing data to disk (hence, the block length on disk is 1024)
>      + set visible in memory length (hence, in memory length is also 1024)
> but it *has not* finalized the block, hence the block info is still in the *ongoingCreates*.
> (Note: the interruption of writer thread makes the finalization never happens)
>  
> Because of all above stuff, dn3 gives dn1 info about the block with length 1024.
>  
> 5. Now dn1 calls its own startBlockRecovery() successfully (because the on-disk
> file length and memory file length match, both are 512 byte).
>  
> 6. Now, dn1 has a sync list (block_X_GS1 at dn1 with length 512, block_X_GS1 at dn3 with length 1024).
> it needs to make sure all dn in the pipeline agree on new GS and length.
> dn1 calls NN.nextGS() to get new GS2. It form new block_X_GS2 with length 512, and
> call updateBlock on dn3 and itself.
>  
> 7. dn3, receiving updateBlock request from dn1, does:
>      + rename the block from block_X_GS1 ==> block_X_GS2
>      + truncate the block file length from 1024 to 512
>      But, here is the key, it *does not update the length of the block kept in ongoingCreates*
>      + return to dn1 successfully
>  
> 8. Now, dn1 call its own updateBlock and *crashes*.
>  
> 9. From client point of view, dn1.recoverBlock fails. 
> It retries call dn1.recoverBlock six times, and declare dn1 as bad.
>  
> 10. Client now calls dn3.recoverBlock()
>  
> 11. Dn3 in turns calls its startBlockRecovery() to
>      + interrupt block writer threads if any
>      + getBlockMetadataInfo (as part of forming the syncList, and updateBlock later)
>           > it first look into ongoingCreates to see the block info is there,
>           and found it (because the block is not finalized).
>           Hence, in-memory length is 1024 (even though truncateBlock is called before) 
>           > verify if the in-memory length (1024) with on-disk length (512)
>           Hence, the *un-matched file length exception*
>  
> 12. From client point of view, recoverBlock fails, because *All data nodes are bad*
> Client retries calling dn3.recoverBlock five more times and gets the same exception,
> Hence, append fails.
>  
> Note:
> - to fix it, i think when truncating the file, we need to update the ongoingCreates too
> (but i am not sure, if we fix thing like this, is there any other workload may affect)
> - interestingly, NN.leaseRecovery fails because of the exact exception at dn3.
> - until dead node restarts and NN.leaseRecovery is triggered again, NN is still the lease holder of the file
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and 
> Haryadi Gunawi (haryadi@eecs.berkeley.edu


--
This message was sent by Atlassian JIRA
(v6.2#6252)