hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Tracking Replication errors
Date Thu, 10 Sep 2009 02:27:45 GMT
Hey everyone,

We're going through a review of our usage of HDFS (it's a good thing!  
- we're trying to get "official").  One reviewer asked a good question  
that I don't know the answer too - could you help?  To quote,

"What steps do you take to ensure the block rebalancing produces non- 
corrupted files?  Do you have to wait 2 weeks before you discover this?"

I believe the correct answer is:

When a block is replicated from one node to another, only the  
resulting block size is checked.  The checksums on the source and  
destination are not compared.  Therefore, if there's any corruption  
that occurs, it would take until the next block verification to detect  

If you look at TCP error rates and random memory corruptions, it  
wouldn't be surprising to see silent errors in copying between nodes,  
especially on multi-hundred-TB or PB scale installs.

Any comments?

View raw message