hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Voigt...@123.org>
Subject Re: file checksum
Date Mon, 25 Jun 2012 11:33:47 GMT
HDFS has block checksums. Whenever a block is written to the datanodes, a checksum is calculated
and written with the block to the datanodes' disks.

Whenever a block is requested, the block's checksum is verified against the stored checksum.
If they don't match, that block is corrupt. But since there's
additional replicas of the block, chances are high one block is matching the checksum. Corrupt
blocks will be scheduled to be rereplicated.

Also, to prevent bit rod, blocks are checked periodically (weekly by default, I believe, you
can configure that period) in the background.


Am 25.06.2012 um 13:29 schrieb Rita:

> Does Hadoop, HDFS in particular, do any sanity checks of the file before
> and after balancing/copying/reading the files? We have 20TB of data and I
> want to make sure after these operating are completed the data is still in
> good shape. Where can I read about this?
> tia
> -- 
> --- Get your facts first, then you can distort them as you please.--

Kai Voigt

View raw message