hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
Date Mon, 29 Oct 2007 18:28:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538553

dhruba borthakur commented on HADOOP-2012:

The metadata about the entire Datanode is stored in the VERSION file. It is possible that
we can store the last verified-blockid in this file (instead of adding a new file).

Do we really need a scan period? Your proposal that the Datanode spends a certain percentage
of the disk bandwidth to verify blocks sounds effective by itself. If a datanode has 100K
blocks each of 128MB each, and it is configured to use 5MB/sec disk bandwidth for verification,
it would take the Datanoed about 4 days to verify each and every block it has in the system.
The next iteration could start immediately. If a datanode has few blocks, each iteration would
finish quickly and the nect iteration would start immediately. Is there a dis-advantage in
starting iterations back-to-back? We can get away by not having another configuration parameter.

> Periodic verification at the Datanode
> -------------------------------------
>                 Key: HADOOP-2012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2012
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.16.0
>         Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch
> Currently on-disk data corruption on data blocks is detected only when it is read by
the client or by another datanode.  These errors are detected much earlier if datanode can
periodically verify the data checksums for the local blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta file associcated
with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode disk traffic
in mind.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message