hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1218) 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
Date Tue, 22 Jun 2010 01:02:56 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880994#action_12880994
] 

Todd Lipcon commented on HDFS-1218:
-----------------------------------

bq. 1. this assumes a DN goes down with the client (either in tandem, or on the same box)
and that the NN initiates lease recovery later correct?

Really, this applies any time that recovery is initiated after the node has come back to life.
The most likely case is a hard lease expiry like you suggest above, since it gives a full
hour for the DN to restart, but it could be a manually triggered recovery as well.

bq. 2. the idea here is that RBW should have lengths longer than RWR, but both will have the
same genstamp?

yep, s/should/could/ though (in many cases, RWR will have the right length)

bq. If so, why aren't we just taking the replica with the longest length? Is there a reason
to

In a normal pipeline failure, it's likely that the earlier DNs in the pipeline will have longer
length than the later ones, right? So if we always just took the longest length, we'd usually
recover to a pipeline of length 1 even when other replicas are available that satisfy correct
semantics. At least, I assume this is the reasoning - in this patch I was just trying to maintain
the semantics elsewhere.

bq. 3. if sync() did not complete, there is no violation. do I follow? i agree we can try
to recover more data if it's there, but i just want to make sure i'm on the same page

The issue here is that sync() could complete, but the post-power-failure replica could still
be truncated. Recall that hflush() doesn't actually fsync to disk, so after an actual power
failure of the local node, it will usually come back with a truncated replica after EXT3 journal
replay.

> 20 append: Blocks recovered on startup should be treated with lower priority during block
synchronization
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1218
>                 URL: https://issues.apache.org/jira/browse/HDFS-1218
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20-append
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.20-append
>
>         Attachments: hdfs-1281.txt
>
>
> When a datanode experiences power loss, it can come back up with truncated replicas (due
to local FS journal replay). Those replicas should not be allowed to truncate the block during
block synchronization if there are other replicas from DNs that have _not_ restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message