hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
Date Wed, 31 Oct 2007 20:50:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539179
] 

Sameer Paranjpye commented on HADOOP-2012:
------------------------------------------

Why not have a scan period only?

The scan period defines a window in which every block that exists at the beginning of the
window will be examined (barring blocks that are deleted). A Datanode would construct a schedule
for examining blocks in a scan period with least recently examined blocks going first. New
blocks would be scheduled in the next window. The schedule could be constructed by dividing
a window into _scanperiod/n_ intervals, one interval per block. A Datanode would make a determination
of how much bandwidth it needs to scan a block based on when the next block is scheduled.

This would guarantee that every block that exists at the beginning of a scan period is examined
once in the scan period. It would also guarantee an upper bound of 2*scan period between 2
scans of a given block. This is also an upper bound on the amount of time that elapses before
a new block is scanned. In both cases, the time elapsed will, in the average case, be close
to scan period and approach 2*scan period if a large number of blocks are added in a window.
These seem like reasonable guarantees.

It would make sense to have a reasonable upper bound on the amount of bandwidth used for scanning
and emit a warning if this is not enough to examine all blocks in a scan period. So if someone
set a scan period of 1 minute or something else silly the Datanode doesn't spend all its time
scanning.




> Periodic verification at the Datanode
> -------------------------------------
>
>                 Key: HADOOP-2012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2012
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch
>
>
> Currently on-disk data corruption on data blocks is detected only when it is read by
the client or by another datanode.  These errors are detected much earlier if datanode can
periodically verify the data checksums for the local blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta file associcated
with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode disk traffic
in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message