hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruba Borthakur <dhr...@gmail.com>
Subject Re: Tracking Replication errors
Date Thu, 10 Sep 2009 03:25:14 GMT
when a block is being received by a datanode (either because of a
replication request or from a client write), the datanode verifies crc.
Also, the there is a thread in the datanode that periodically verifies crc
of existing blocks.

dhruba


On Wed, Sep 9, 2009 at 7:27 PM, Brian Bockelman <bbockelm@cse.unl.edu>wrote:

> Hey everyone,
>
> We're going through a review of our usage of HDFS (it's a good thing! -
> we're trying to get "official").  One reviewer asked a good question that I
> don't know the answer too - could you help?  To quote,
>
> "What steps do you take to ensure the block rebalancing produces
> non-corrupted files?  Do you have to wait 2 weeks before you discover this?"
>
> I believe the correct answer is:
>
> """
> When a block is replicated from one node to another, only the resulting
> block size is checked.  The checksums on the source and destination are not
> compared.  Therefore, if there's any corruption that occurs, it would take
> until the next block verification to detect it.
> """
>
> If you look at TCP error rates and random memory corruptions, it wouldn't
> be surprising to see silent errors in copying between nodes, especially on
> multi-hundred-TB or PB scale installs.
>
> Any comments?
>
> Brian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message