hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1224) Stale connection makes node miss append
Date Thu, 09 Sep 2010 05:25:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907522#action_12907522
] 

dhruba borthakur commented on HDFS-1224:
----------------------------------------

It appears to me that this bug states that while DN2 got all the data from an append, it still
is thrown out of the write-pipeline. The write continues to occur to the DN1 and DN3. A subsequent
reader will never read data form DN2.

This is not a bug, but the situation could be improved/enhanced by somehow avoiding using
stale RPC proxies.

> Stale connection makes node miss append
> ---------------------------------------
>
>                 Key: HDFS-1224
>                 URL: https://issues.apache.org/jira/browse/HDFS-1224
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20-append
>            Reporter: Thanh Do
>
> - Summary: if a datanode crashes and restarts, it may miss an append.
>  
> - Setup:
> + # available datanodes = 3
> + # replica = 3 
> + # disks / datanode = 1
> + # failures = 1
> + failure type = crash
> + When/where failure happens = after the first append succeed
>  
> - Details:
> Since each datanode maintains a pool of IPC connections, whenever it wants
> to make an IPC call, it first looks into the pool. If the connection is not there, 
> it is created and put in to the pool. Otherwise the existing connection is used.
> Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the primary.
> After the client appends to block X successfully, dn2 crashes and restarts.
> Now client writes a new block Y to dn1, dn2 and dn3. The write is successful.
> Client starts appending to block Y. It first calls dn1.recoverBlock().
> Dn1 will first create a proxy corresponding with each of the datanode in the pipeline
> (in order to make RPC call like getMetadataInfo( )  or updateBlock( )). However, because
> dn2 has just crashed and restarts, its connection in dn1's pool become stale. Dn1 uses
> this connection to make a call to dn2, hence an exception. Therefore, append will be
> made only to dn1 and dn3, although dn2 is alive and the write of block Y to dn2 has
> been successful.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and 
> Haryadi Gunawi (haryadi@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message