hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uma Maheswara Rao G 72686 <mahesw...@huawei.com>
Subject Re: Packets->Block
Date Thu, 03 Nov 2011 11:01:29 GMT
Hello Karthik,
 see inline
----- Original Message -----
From: kartheek muthyala <kartheek0274@gmail.com>
Date: Thursday, November 3, 2011 4:02 pm
Subject: Re: Packets->Block
To: common-user@hadoop.apache.org

> Thanks Uma for the prompt reply.
> I have one more doubt, as i can see block class contains only metadata
> information like Timestamp, length. But the actual data is in the 
> streams.What I cannot understand is that where is the data getting 
> written  from
> streams to blockfile.(which function is taking care of this? ).
Yes, block will contains all the information like blockID, generation timestamp,  number of
bytes...
 Block is writable, so that we can transfer them through network. ( ex: DN will send block
reports,...etc ). 
 Actual data will in disk with the name of blk_<block id>
 So, using this block id, we can identify the block name directly.
 When the block is created at the DN side, volumes map will maintans replicaBeingWritten objs
with this block ID information . 

You can see the code in BlockReceiver constructor, i.e, once it gets the replicaInfo, it will
call creatStreams on that replicainfo. So, that will create the FileOutPutStreams.

Regards,
Uma
> ~Kartheek.
> 
> On Thu, Nov 3, 2011 at 12:55 PM, Uma Maheswara Rao G 72686 <
> maheswara@huawei.com> wrote:
> 
> > ----- Original Message -----
> > From: kartheek muthyala <kartheek0274@gmail.com>
> > Date: Thursday, November 3, 2011 11:23 am
> > Subject: Packets->Block
> > To: common-user@hadoop.apache.org
> >
> > > Hi all,
> > > I need some info related to the code section which handles the
> > > followingoperations.
> > >
> > > Basically DataXceiver.c on the client side  transmits the 
> block in
> > > packetsand
> > Actually DataXceiver will run only in DN. Whenever you create a file
> > DataStreamer thread will start in DFSClient. Whenever 
> application writing
> > the bytes, they will be enqueued into dataQueue. Streamer thread 
> will pick
> > the packets from dataqueue and write on to the pipeline sockets. 
> Also it
> > will write the opcodes to tell the DN about the kind of operation.
> >  on the data node side we have DataXceiver.c and
> > > BlockReceiver.c files
> > > which take care of writing these packets in order to a block file
> > > until the
> > > last packet for the block is received. I want some info around
> > > this area
> > DataXceiverServer will run and listen for the requests. For 
> every request
> > it receives, it will create DataXceiver thread and pass the info 
> to it.
> > Based on the opcode it will create BlockReceiver or BlockSender 
> objects and
> > give the control to it.
> > > where in BlockReceiver.c , i have seen a PacketResponder class 
> and a
> > > BlockReceiver class where in two places you are finalizing the
> > > block (What
> > > i understood by finalizing is that when the last packet for the
> > > block is
> > > received, you are closing the block file). In PacketResponder
> > > class in two
> > > places you are using finalizeBlock() function, one in
> > > lastDataNodeRun()function and the other in run() method and in
> > > BlockReceiver.c you are using
> > > finalizeBlock() in receiveBlock() function. I understood from the
> > > commentsthat the finalizeBlock() call from run() method is done
> > > for the datanode
> > > with which client directly interacts and finalizeBlock() call from
> > > receiveBlock() function is done for all the datanodes where the
> > > block is
> > > sent for replication.
> >  As part replication, if one block has received by DN and also block
> > length will be know before itself. So, receivePacket() 
> invocation in while
> > loop itself can read the complete block. So, after reading, it 
> need to
> > finalize the block to add into volumesMap.
> >  But i didn't understand why there is a
> > > finalizeBlock() call from lastDataNodeRun() function.
> > This call will be for current writes from client/DN, it will not 
> know the
> > actual size untill client says that is last packet in current block.
> > finalizeBlock will be called if the packet is lastPacket for 
> that block.
> > finalizeBlock will add the replica into volumesMap. Also if the 
> packet is
> > last one, then it needs to close all the blocks files in DN 
> which were
> > opened for writes.
> > > Can someone explain me about this? I may be wrong at most of the
> > > places of
> > > my understanding of the workflow. Correct me if i am wrong.
> > >
> > > Thanks,
> > > Kartheek
> > >
> >
> > Regards,
> > Uma
> >
> 

Mime
View raw message