hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kartheek muthyala <kartheek0...@gmail.com>
Subject Re: Generation Stamp
Date Wed, 30 Nov 2011 04:07:05 GMT
Thanks Uma..:)

On Tue, Nov 29, 2011 at 10:48 PM, Uma Maheswara Rao G
<maheswara@huawei.com>wrote:

>  Yes. :-)
>  ------------------------------
> *From:* kartheek muthyala [kartheek0274@gmail.com]
> *Sent:* Tuesday, November 29, 2011 10:20 PM
> *To:* hdfs-user@hadoop.apache.org
> *Subject:* Re: Generation Stamp
>
>  Uma, first of all thanks for the detailed exemplified explanation.
>
> So to confirm, the primary use of having this generationTimeStamp is to
> ensure consistency of the block?. So, when the pipeline is failed at DN3,
> and the client invokes recovery, then the NN will chose DN1 to complete the
> pipeline. The DN1 first updates its metafile with the new time stamp, and
> then passes this information to the other replica at DN2. Further, in the
> future NN sees that this particular block is under replicated and it
> assigns some other DNa and asks either DN1/DN2 to replicate the same at
> DNa.
>
>
> Thanks,
> Kartheek.
>
>
> On Tue, Nov 29, 2011 at 8:10 PM, Uma Maheswara Rao G <maheswara@huawei.com
> > wrote:
>
>>  Generationstamp is basically to keep track of the replica states.
>>
>>  Consider one scenario where generation smap will be use:
>>
>>   Create a file which has one block. client started writing that block to
>> DN1, DN 2, DN3 ( pipeline )
>>
>> After writing some data DN3 failed, then Client will get the exception
>> about pipeline failuere. Then Client will handle that exception ( you can
>> see it in processDataNodeError in DataStreamer thread) . It will remove DN3
>> and will call the recovery for that block with new generation time stamp,
>> then NN will choose one primary DN and assign block synchronization
>> work.Then primary DN will ensure that all the remainnng block lengths are
>> same ( if require it will truncate to consistant length) and will invoke
>> committblckSynchronization. Then remaing datatransfer will resume.
>>
>>
>>
>>  now block will have new genartion timestamp. You can observe this in
>> metadata file for that block in DN.
>>
>>
>>
>> now the block will be like blk_12345634444, blk_12345634444_1234.meta
>>
>> here 1234 is the generation timestamp.
>>
>> Assume a case, after resuming the write again, DN2 fails, then again
>> recovery will starts and will get new Generation time stamp again. now only
>> DN1 in pipeline  and block is blk_12345634444, blk_12345634444_1235.meta.
>> resume the the remaing data writes and complted the last packet. With the
>> last packet blocks should be finalized. DN1 is finalized the block
>> successfully and DN1 will send blocks received command and block info will
>> be updated in blocks map . Assume if DN2 comes back and sending that old
>> block in reports to NN. Here NN can find that generation timestamp of that
>> block is lesser than DN1 reported blocks genstamp. So, it can take the
>> decision now. it can reject the lesser generation time stamp block.
>>
>>
>>
>> Yu can see this code in FSNameSystem#addStoredBlock.  ofcource there will
>> be many conditions like length mismatch..etc
>>
>>
>>
>> Hope it will help you....
>>
>>
>>
>> Regards,
>>
>> Uma
>>
>>
>>
>>
>>  ------------------------------
>> *From:* kartheek muthyala [kartheek0274@gmail.com]
>> *Sent:* Tuesday, November 29, 2011 7:44 PM
>> *To:* hdfs-user
>> *Subject:* Generation Stamp
>>
>>   Hi all,
>> Why is there the concept of Generation Stamp that is getting tagged to
>> the metadata of the block.? How is it useful? I have seen that in the hdfs
>> current directory, the metafiles are tagged with this generation stamp.
>> Does this keep track of the versioning?
>> ~Kartheek.
>>
>
>

Mime
View raw message