hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From reena upadhyay <reena2...@outlook.com>
Subject How check sum are generated for blocks in data node
Date Fri, 28 Mar 2014 10:28:51 GMT
I was going through this link http://stackoverflow.com/questions/9406477/data-integrity-in-hdfs-which-data-nodes-verifies-the-checksum
. Its written that in recent version of hadoop only the last data node verifies the checksum
as the write happens in a pipeline fashion. 
Now I have a question:
Assuming my cluster has two data nodes A and B cluster, I have a file, half of the file content
is written on first data node A and the other remaining half is written on the second data
node B to take advantage of parallelism.  My question is:  Will data node A will not store
the check sum for the blocks stored on it. 

As per the line "only the last data node verifies the checksum", it looks like only the  last
data node in my case it will be data node B, will generate the checksum. But if only data
node B generates checksum, then it will generate the check sum only for the blocks stored
on data node B. What about the checksum for the data blocks on data node  machine A?
View raw message