hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Many Checksum Errors
Date Wed, 16 May 2007 18:25:42 GMT
[ Moving discussion to hadoop-dev.  -drc ]

Raghu Angadi wrote:
> This is good validation how important ECC memory is. Currently HDFS 
> client deletes a block when it notices a checksum error. After moving to 
> Block level CRCs soon, we should make Datanode re-validate the block 
> before deciding to delete it.

It also emphasizes how important end-to-end checksums are.  Data should 
also be checksummed as soon as possible after it is generated, before it 
has a chance to be corrupted.

Ideally, the initial buffer that stores the data should be small, and 
data should be checksummed as this initial buffer is flushed.  In the 
current implementation, the small checksum buffer is the second buffer, 
the initial buffer is the larger, io.buffer.size buffer.  To provide 
maximum protection against memory errors, this situation should be reversed.

This is discussed in https://issues.apache.org/jira/browse/HADOOP-928. 
Perhaps a new issue should be filed to reverse the order of these 
buffers, so that data is checksummed before entering the larger, 
longer-lived buffer?

Doug

Mime
View raw message