hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: To solve the checksum errors on the non-ecc mem machines.
Date Tue, 14 Aug 2007 16:30:32 GMT
Daeseong Kim wrote:
> To solve the checksum errors on the non-ecc memory machines, I
> modified some codes in DFSClient.java and DataNode.java.
> 
> The idea is very simple.
> The original CHUNK structure is
> {chunk size}{chunk data}{chunk size}{chunk data}...
> 
> The modified CHUNK structure is
> {chunk size}{chunk data}{chunk crc}{chunk size}{chunk data}{chunk crc}...

This is very similar to the approach taken in HADOOP-1134:

   https://issues.apache.org/jira/browse/HADOOP-1134

This will be included in the upcoming 0.14 release.  HDFS checksums are 
no longer stored in parallel HDFS files, but directly by the filesystem 
with each block.  I do not know whether this will make Hadoop usable on 
non-ecc hosts, but it might help.

It primarily improves things in the following ways:

1. Removing checksum files from HDFS frees a lot of memory in the namenode.

2. Data corruption can be detected before data is read.  This means 
that, instead of a job failing because its input is corrupt, tasks from 
the prior job, which generated that now-corrupt data, can be failed and 
retried as corruptions are detected at write time.  However, if a task 
repeatedly fails due to corruptions, jobs will still fail, so this may 
not remedy things entirely for non-ecc hosts.

Doug

Mime
View raw message