hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
Date Thu, 25 Oct 2007 06:51:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537517

eric baldeschwieler commented on HADOOP-2012:

A couple of comments:

1) the idea of not keeping scan times strikes me as "really bad".  Also randomizing the scan
order weirds me out.  I do think Raghu's point about modeling the meta files may be valid.
 Why not simply always scan blocks in numeric order starting from zero and log actions.  On
restart we can tail this log to find the last block validated and start from there.  Inserted
blocks are recent by definition, so it is ok if we don't get around to them til the next scan.

2) It seems to me it might be better to try to repair the block if possible, rather then just
delete it.  This avoids bad corner cases.  It adds complexity though.  Thoughts?  A simple
variant is just to copy a new version locally.

3) Throttling might simply entail spacing and scheduling when you scan the next block to complete
within roughly two weeks.  This would imply that we want to persist when the current scan
started.  If we do that, the penalty of scanning quickly might be fairly ignorable, when you
consider the other variations in work load a DN is exposed to anyway.  You'd want some rule
like always wait 10x the time it took you to validate a block between blocks to avoid wierd
corner cases where the node gets a huge number of blocks added near the end of the time period.

> Periodic verification at the Datanode
> -------------------------------------
>                 Key: HADOOP-2012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2012
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.16.0
>         Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch
> Currently on-disk data corruption on data blocks is detected only when it is read by
the client or by another datanode.  These errors are detected much earlier if datanode can
periodically verify the data checksums for the local blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta file associcated
with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode disk traffic
in mind.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message