Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <33064432.1192131170802.JavaMail.jira@brutus>
Date: Thu, 11 Oct 2007 12:32:50 -0700 (PDT)
From: "Sameer Paranjpye (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-2012) Periodic verification at the
 Datanode
In-Reply-To: <10538159.1191891590670.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534137 ] 

Sameer Paranjpye commented on HADOOP-2012:
------------------------------------------

> Many disk errors occur during the write path. If we modify .meta file that contains .crc every few days, we might actually increase error rate.

Would writing more to the .meta file increase the chance of the .meta file being corrupted? I'm not sure I see that. 

> Also writing to metafile will be at least one random seek/write for each verification.

This matters only if the validation is done very frequently. The scheme proposed here suggests slow background checking. Do we really need to check more than a few blocks an hour?

> Periodic verification at the Datanode
> -------------------------------------
>
>                 Key: HADOOP-2012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2012
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode.  These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.