hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
Date Wed, 10 Oct 2007 17:32:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533809

dhruba borthakur commented on HADOOP-2012:

Thinking more about this one, I agree with Rob that it is important to have an algorithm that
eventually verifies all blocks even in the face of frequent datanode restarts. In fact, if
we want the datanode to scale to hundred thousand blocks, then this algorithm is essential.

Instead of storing the last modification time of each block, can we have some other algorithm
where each block's metadata need not be updated everytime a block is verified? How about if
we start verifying blocks in increasing blockid order and record the current blockid that
was verified? Maybe we need to persist this information only once every 100 blocks or so.
If we reach the largest known blockid then we cycle back to the lowest blockid and start verifying
from there. For a datanode that has 100K blocks, it will take only about 1MB of memory to
keep a lazily-sorted list of blockids.

> Periodic verification at the Datanode
> -------------------------------------
>                 Key: HADOOP-2012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2012
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
> Currently on-disk data corruption on data blocks is detected only when it is read by
the client or by another datanode.  These errors are detected much earlier if datanode can
periodically verify the data checksums for the local blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta file associcated
with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode disk traffic
in mind.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message