hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sam rash (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1262) Failed pipeline creation during append leaves lease hanging on NN
Date Wed, 30 Jun 2010 09:22:50 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883855#action_12883855
] 

sam rash commented on HDFS-1262:
--------------------------------

that's probably better.  this was dependent on it as i was killing the datanodes to simulate
the pipeline failure.  i ended up tuning the test case to use mockito to throw exceptions
at the end of a NN rpc call for both append() and create(), so I think that dependency is
gone. 

can we mark this as dependent on that if it turns out to be needed?


> Failed pipeline creation during append leaves lease hanging on NN
> -----------------------------------------------------------------
>
>                 Key: HDFS-1262
>                 URL: https://issues.apache.org/jira/browse/HDFS-1262
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client, name-node
>    Affects Versions: 0.20-append
>            Reporter: Todd Lipcon
>            Assignee: sam rash
>            Priority: Critical
>             Fix For: 0.20-append
>
>         Attachments: hdfs-1262-1.txt
>
>
> Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened was the
following:
> 1) File's original writer died
> 2) Recovery client tried to open file for append - looped for a minute or so until soft
lease expired, then append call initiated recovery
> 3) Recovery completed successfully
> 4) Recovery client calls append again, which succeeds on the NN
> 5) For some reason, the block recovery that happens at the start of append pipeline creation
failed on all datanodes 6 times, causing the append() call to throw an exception back to HBase
master. HBase assumed the file wasn't open and put it back on a queue to try later
> 6) Some time later, it tried append again, but the lease was still assigned to the same
DFS client, so it wasn't able to recover.
> The recovery failure in step 5 is a separate issue, but the problem for this JIRA is
that the NN can think it failed to open a file for append when the NN thinks the writer holds
a lease. Since the writer keeps renewing its lease, recovery never happens, and no one can
open or recover the file until the DFS client shuts down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message