hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: DataBlockScanner scan period
Date Wed, 24 Nov 2010 01:52:59 GMT

On Nov 23, 2010, at 7:41 PM, Thanh Do wrote:

> sorry for digging up this old thread.
> Brian, is this the reason you want to add a "data-level" scan
> to HDFS, as in HDFS-221.
> It seems to me that a very rarely read block could
> be silently corrupted, because the DataBlockScanner
> never finish it scanning job in 3 weeks...

Why?  What if you restarted your datanode once every 2 weeks?  Last I checked, HDFS randomly
assigned blocks to be verified throughout a time interval.  If you have too many blocks and
an insufficient time interval, because HDFS also provides a rate limiting feature, you can
easily come up with a case where blocks won't get verified.

The reason one wants a data-level scan is if the admin wants to manually verify that all copies
of a file are good (well, "good" compared to the checksum... maybe the user corrupted it before
uploading it :).  It'd be a great debugging tool to put site admin's minds at easy.

View raw message