Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of kartheek0274@gmail.com
 designates 209.85.212.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <1542FA4EE20C5048A5C2A3663BED2A6B0F86DC3B@szxeml531-mbx.china.huawei.com>
References: 
 <CANAqB2heTaNXyLuUpbmQGBpWNsU=YecV5G1Jo79LW_EHq1e7AA@mail.gmail.com>
 <1542FA4EE20C5048A5C2A3663BED2A6B0EEDBBA9@szxeml504-mbs.china.huawei.com>
 <CANAqB2g1S0_nV8iUK-TLi2Xu_WR6CA95hJNHE9K2NUB+930MTg@mail.gmail.com>
 <1542FA4EE20C5048A5C2A3663BED2A6B0F86DC3B@szxeml531-mbx.china.huawei.com>
From: kartheek muthyala <kartheek0274@gmail.com>
Date: Wed, 30 Nov 2011 09:37:05 +0530
Message-ID: 
 <CANAqB2inrDo6rqjQvzeLkvnHu7HB8fgoGWPc+W2pZt_ansc6KQ@mail.gmail.com>
Subject: Re: Generation Stamp
To: hdfs-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=20cf307cfc64aafefe04b2ebdf1d

--20cf307cfc64aafefe04b2ebdf1d
Content-Type: text/plain; charset=ISO-8859-1

Thanks Uma..:)

On Tue, Nov 29, 2011 at 10:48 PM, Uma Maheswara Rao G
<maheswara@huawei.com>wrote:

>  Yes. :-)
>  ------------------------------
> *From:* kartheek muthyala [kartheek0274@gmail.com]
> *Sent:* Tuesday, November 29, 2011 10:20 PM
> *To:* hdfs-user@hadoop.apache.org
> *Subject:* Re: Generation Stamp
>
>  Uma, first of all thanks for the detailed exemplified explanation.
>
> So to confirm, the primary use of having this generationTimeStamp is to
> ensure consistency of the block?. So, when the pipeline is failed at DN3,
> and the client invokes recovery, then the NN will chose DN1 to complete the
> pipeline. The DN1 first updates its metafile with the new time stamp, and
> then passes this information to the other replica at DN2. Further, in the
> future NN sees that this particular block is under replicated and it
> assigns some other DNa and asks either DN1/DN2 to replicate the same at
> DNa.
>
>
> Thanks,
> Kartheek.
>
>
> On Tue, Nov 29, 2011 at 8:10 PM, Uma Maheswara Rao G <maheswara@huawei.com
> > wrote:
>
>>  Generationstamp is basically to keep track of the replica states.
>>
>>  Consider one scenario where generation smap will be use:
>>
>>   Create a file which has one block. client started writing that block to
>> DN1, DN 2, DN3 ( pipeline )
>>
>> After writing some data DN3 failed, then Client will get the exception
>> about pipeline failuere. Then Client will handle that exception ( you can
>> see it in processDataNodeError in DataStreamer thread) . It will remove DN3
>> and will call the recovery for that block with new generation time stamp,
>> then NN will choose one primary DN and assign block synchronization
>> work.Then primary DN will ensure that all the remainnng block lengths are
>> same ( if require it will truncate to consistant length) and will invoke
>> committblckSynchronization. Then remaing datatransfer will resume.
>>
>>
>>
>>  now block will have new genartion timestamp. You can observe this in
>> metadata file for that block in DN.
>>
>>
>>
>> now the block will be like blk_12345634444, blk_12345634444_1234.meta
>>
>> here 1234 is the generation timestamp.
>>
>> Assume a case, after resuming the write again, DN2 fails, then again
>> recovery will starts and will get new Generation time stamp again. now only
>> DN1 in pipeline  and block is blk_12345634444, blk_12345634444_1235.meta.
>> resume the the remaing data writes and complted the last packet. With the
>> last packet blocks should be finalized. DN1 is finalized the block
>> successfully and DN1 will send blocks received command and block info will
>> be updated in blocks map . Assume if DN2 comes back and sending that old
>> block in reports to NN. Here NN can find that generation timestamp of that
>> block is lesser than DN1 reported blocks genstamp. So, it can take the
>> decision now. it can reject the lesser generation time stamp block.
>>
>>
>>
>> Yu can see this code in FSNameSystem#addStoredBlock.  ofcource there will
>> be many conditions like length mismatch..etc
>>
>>
>>
>> Hope it will help you....
>>
>>
>>
>> Regards,
>>
>> Uma
>>
>>
>>
>>
>>  ------------------------------
>> *From:* kartheek muthyala [kartheek0274@gmail.com]
>> *Sent:* Tuesday, November 29, 2011 7:44 PM
>> *To:* hdfs-user
>> *Subject:* Generation Stamp
>>
>>   Hi all,
>> Why is there the concept of Generation Stamp that is getting tagged to
>> the metadata of the block.? How is it useful? I have seen that in the hdfs
>> current directory, the metafiles are tagged with this generation stamp.
>> Does this keep track of the versioning?
>> ~Kartheek.
>>
>
>

--20cf307cfc64aafefe04b2ebdf1d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks Uma..:)<br><br><div class=3D"gmail_quote">On Tue, Nov 29, 2011 at 10=
:48 PM, Uma Maheswara Rao G <span dir=3D"ltr">&lt;<a href=3D"mailto:maheswa=
ra@huawei.com">maheswara@huawei.com</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex;">


<div>
<div style=3D"direction:ltr;font-family:Tahoma;color:#000000;font-size:10pt=
">Yes. :-)<br>
<div style=3D"font-family:Times New Roman;color:rgb(0,0,0);font-size:16px">
<hr>
<div style=3D"direction:ltr"><font color=3D"#000000" face=3D"Tahoma" size=
=3D"2"><b>From:</b> kartheek muthyala [<a href=3D"mailto:kartheek0274@gmail=
.com" target=3D"_blank">kartheek0274@gmail.com</a>]<br>
<b>Sent:</b> Tuesday, November 29, 2011 10:20 PM<br>
<b>To:</b> <a href=3D"mailto:hdfs-user@hadoop.apache.org" target=3D"_blank"=
>hdfs-user@hadoop.apache.org</a><br>
<b>Subject:</b> Re: Generation Stamp<br>
</font><br>
</div><div><div></div><div class=3D"h5">
<div></div>
<div>Uma, first of all thanks for the detailed exemplified explanation.<br>
<br>
So to confirm, the primary use of having this generationTimeStamp is to ens=
ure consistency of the block?. So, when the pipeline is failed at DN3, and =
the client invokes recovery, then the NN will chose DN1 to complete the pip=
eline. The DN1 first updates its
 metafile with the new time stamp, and then passes this information to the =
other replica at DN2. Further, in the future NN sees that this particular b=
lock is under replicated and it assigns some other DNa and asks either DN1/=
DN2 to replicate the same at DNa.
<br>
<br>
<br>
Thanks,<br>
Kartheek.<br>
<br>
<br>
<div class=3D"gmail_quote">On Tue, Nov 29, 2011 at 8:10 PM, Uma Maheswara R=
ao G <span dir=3D"ltr">
&lt;<a href=3D"mailto:maheswara@huawei.com" target=3D"_blank">maheswara@hua=
wei.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0pt 0pt 0pt 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div style=3D"direction:ltr;font-family:Tahoma;color:rgb(0,0,0);font-size:1=
0pt">
<p>Generationstamp is basically to keep track of the replica states.</p>
<p>=A0Consider one scenario where generation smap will be use:</p>
<p>=A0=A0Create a file which has one block. client started writing that blo=
ck to DN1, DN 2, DN3 ( pipeline )</p>
<p>After writing some data DN3 failed, then Client will get the exception a=
bout pipeline failuere. Then Client will handle that exception ( you can se=
e it in processDataNodeError in DataStreamer thread) . It will remove DN3 a=
nd will call the recovery for that
 block with new generation time stamp, then NN will choose one primary DN a=
nd assign block synchronization work.Then primary DN will ensure that all t=
he remainnng block lengths are same ( if require it will truncate to consis=
tant length) and will invoke committblckSynchronization.
 Then remaing datatransfer will resume. </p>
<p>=A0</p>
<p>=A0now block will have new genartion timestamp. You can observe this in =
metadata file for that block in DN.
</p>
<p>=A0</p>
<p>now the block will be like blk_<a href=3D"tel:12345634444" value=3D"+123=
45634444" target=3D"_blank">12345634444</a>, blk_<a href=3D"tel:12345634444=
" value=3D"+12345634444" target=3D"_blank">12345634444</a>_1234.meta</p>
<p>here 1234 is the generation timestamp. </p>
<p>Assume a case, after resuming the write again, DN2 fails, then again rec=
overy will starts and will get new Generation time stamp again. now only DN=
1 in pipeline=A0 and block is blk_<a href=3D"tel:12345634444" value=3D"+123=
45634444" target=3D"_blank">12345634444</a>,
 blk_<a href=3D"tel:12345634444" value=3D"+12345634444" target=3D"_blank">1=
2345634444</a>_1235.meta. resume the the remaing data writes and complted t=
he last packet. With the last packet blocks should be finalized. DN1 is fin=
alized the block successfully and DN1
 will send blocks received command and block info will be updated in blocks=
 map . Assume if DN2 comes back and sending that old block in reports to NN=
. Here NN can find that generation timestamp of that block is lesser than D=
N1 reported blocks genstamp. So,
 it can take the decision now. it can reject the lesser generation time sta=
mp block.</p>
<p>=A0</p>
<p>Yu can see this code in FSNameSystem#addStoredBlock.=A0 ofcource there w=
ill be many conditions like length mismatch..etc</p>
<p>=A0</p>
<p>Hope it will help you....</p>
<p>=A0</p>
<p>Regards,</p>
<p>Uma=A0</p>
<p>=A0</p>
<p>=A0</p>
<div style=3D"font-size:16px;color:rgb(0,0,0);font-family:Times New Roman">
<hr>
<div style=3D"direction:ltr"><font color=3D"#000000" face=3D"Tahoma" size=
=3D"2"><b>From:</b> kartheek muthyala [<a href=3D"mailto:kartheek0274@gmail=
.com" target=3D"_blank">kartheek0274@gmail.com</a>]<br>
<b>Sent:</b> Tuesday, November 29, 2011 7:44 PM<br>
<b>To:</b> hdfs-user<br>
<b>Subject:</b> Generation Stamp<br>
</font><br>
</div>
<div>
<div></div>
<div>Hi all,<br>
Why is there the concept of Generation Stamp that is getting tagged to the =
metadata of the block.? How is it useful? I have seen that in the hdfs curr=
ent directory, the metafiles are tagged with this generation stamp. Does th=
is keep track of the versioning?<br>


~Kartheek.<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div></div></div>
</div>
</div>

</blockquote></div><br>

--20cf307cfc64aafefe04b2ebdf1d--