hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhanwei Wang" <had...@wangzw.org>
Subject Re: Generation Stamp
Date Wed, 30 Nov 2011 11:04:14 GMT
Hi, everyone


Following the discussing, I would like to know if the DataNode report a
overage block to Namenode, according to Uma, NameNode can reject it, what
the DataNode will do then? Ask other datanode copy a new replica to it and
delete the old one? Or NameNode will arrange the work if the number of the
replicas is below the specified value? Where can I find this code?




Zhanwei Wang



发件人: hdfs-user-return-1831-hadoop=wangzw.org@hadoop.apache.org
[mailto:hdfs-user-return-1831-hadoop=wangzw.org@hadoop.apache.org] 代表
kartheek muthyala
发送时间: 2011年11月30日 12:07
收件人: hdfs-user@hadoop.apache.org
主题: Re: Generation Stamp


Thanks Uma..:)

On Tue, Nov 29, 2011 at 10:48 PM, Uma Maheswara Rao G <maheswara@huawei.com>

Yes. :-)


From: kartheek muthyala [kartheek0274@gmail.com]
Sent: Tuesday, November 29, 2011 10:20 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: Generation Stamp

Uma, first of all thanks for the detailed exemplified explanation.

So to confirm, the primary use of having this generationTimeStamp is to
ensure consistency of the block?. So, when the pipeline is failed at DN3,
and the client invokes recovery, then the NN will chose DN1 to complete the
pipeline. The DN1 first updates its metafile with the new time stamp, and
then passes this information to the other replica at DN2. Further, in the
future NN sees that this particular block is under replicated and it assigns
some other DNa and asks either DN1/DN2 to replicate the same at DNa. 


On Tue, Nov 29, 2011 at 8:10 PM, Uma Maheswara Rao G <maheswara@huawei.com>

Generationstamp is basically to keep track of the replica states.

 Consider one scenario where generation smap will be use:

  Create a file which has one block. client started writing that block to
DN1, DN 2, DN3 ( pipeline )

After writing some data DN3 failed, then Client will get the exception about
pipeline failuere. Then Client will handle that exception ( you can see it
in processDataNodeError in DataStreamer thread) . It will remove DN3 and
will call the recovery for that block with new generation time stamp, then
NN will choose one primary DN and assign block synchronization work.Then
primary DN will ensure that all the remainnng block lengths are same ( if
require it will truncate to consistant length) and will invoke
committblckSynchronization. Then remaing datatransfer will resume. 


 now block will have new genartion timestamp. You can observe this in
metadata file for that block in DN. 


now the block will be like blk_12345634444, blk_12345634444_1234.meta

here 1234 is the generation timestamp. 

Assume a case, after resuming the write again, DN2 fails, then again
recovery will starts and will get new Generation time stamp again. now only
DN1 in pipeline  and block is blk_12345634444, blk_12345634444_1235.meta.
resume the the remaing data writes and complted the last packet. With the
last packet blocks should be finalized. DN1 is finalized the block
successfully and DN1 will send blocks received command and block info will
be updated in blocks map . Assume if DN2 comes back and sending that old
block in reports to NN. Here NN can find that generation timestamp of that
block is lesser than DN1 reported blocks genstamp. So, it can take the
decision now. it can reject the lesser generation time stamp block.


Yu can see this code in FSNameSystem#addStoredBlock.  ofcource there will be
many conditions like length mismatch..etc


Hope it will help you....







From: kartheek muthyala [kartheek0274@gmail.com]
Sent: Tuesday, November 29, 2011 7:44 PM
To: hdfs-user
Subject: Generation Stamp

Hi all,
Why is there the concept of Generation Stamp that is getting tagged to the
metadata of the block.? How is it useful? I have seen that in the hdfs
current directory, the metafiles are tagged with this generation stamp. Does
this keep track of the versioning?



View raw message