hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "VinayaKumar B (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3161) 20 Append: Excluded DN replica from recovery should be removed from DN.
Date Thu, 05 Apr 2012 09:11:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247102#comment-13247102

VinayaKumar B commented on HDFS-3161:

Hi Uma,
One more scenario, where same block with different genstamps can be present at DN.

Scenario 2: Both blocks in current
1) File is written to DN1-> DN2 -> DN3, and file closed with generation stamp 1.
2) Now DN3 network down.
3) Append is called on same file, and append recovery is success in DN1 and DN2 with genstamp
4) Some more data is written and append stream is closed with genstamp 2.
5) Now DN3 network comes back.
6) Now before DN3 sends the block report to NN, NN asked for replication of block with genstamp
2 to DN3.
7) After Replication DN3 will have same block with 2 genstamps.
8) On Next Block Report NN will invalidate the block with genstamp 1.

Here, if the subdir of both blocks is same, then invalidation of old block, will delete the
actual block file, resulting scan failure.
In the next block report, NN will remove this datanode from the blocksMap, and again replication
will happen. But for replication another DN is choosen then, this old block related entries
will be still present in DN memory.
> 20 Append: Excluded DN replica from recovery should be removed from DN.
> -----------------------------------------------------------------------
>                 Key: HDFS-3161
>                 URL: https://issues.apache.org/jira/browse/HDFS-3161
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: suja s
>            Priority: Critical
>             Fix For: 1.0.3
> 1) DN1->DN2->DN3 are in pipeline.
> 2) Client killed abruptly
> 3) one DN has restarted , say DN3
> 4) In DN3 info.wasRecoveredOnStartup() will be true
> 5) NN recovery triggered, DN3 skipped from recovery due to above check.
> 6) Now DN1, DN2 has blocks with generataion stamp 2 and DN3 has older generation stamp
say 1 and also DN3 still has this block entry in ongoingCreates
> 7) as part of recovery file has closed and got only two live replicas ( from DN1 and
> 8) So, NN issued the command for replication. Now DN3 also has the replica with newer
generation stamp.
> 9) Now DN3 contains 2 replicas on disk. and one entry in ongoing creates with referring
to blocksBeingWritten directory.
> When we call append/ leaseRecovery, it may again skip this node for that recovery as
blockId entry still presents in ongoingCreates with startup recovery true.
> It may keep continue this dance for evry recovery.
> And this stale replica will not be cleaned untill we restart the cluster. Actual replica
will be trasferred to this node only through replication process.
> Also unnecessarily that replicated blocks will get invalidated after next recoveries....

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message