hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kartheek muthyala <kartheek0...@gmail.com>
Subject Re: Packets->Block
Date Thu, 03 Nov 2011 10:30:23 GMT
Thanks Uma for the prompt reply.
I have one more doubt, as i can see block class contains only metadata
information like Timestamp, length. But the actual data is in the streams.
What I cannot understand is that where is the data getting written  from
streams to blockfile.(which function is taking care of this? ).

On Thu, Nov 3, 2011 at 12:55 PM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

> ----- Original Message -----
> From: kartheek muthyala <kartheek0274@gmail.com>
> Date: Thursday, November 3, 2011 11:23 am
> Subject: Packets->Block
> To: common-user@hadoop.apache.org
> > Hi all,
> > I need some info related to the code section which handles the
> > followingoperations.
> >
> > Basically DataXceiver.c on the client side  transmits the block in
> > packetsand
> Actually DataXceiver will run only in DN. Whenever you create a file
> DataStreamer thread will start in DFSClient. Whenever application writing
> the bytes, they will be enqueued into dataQueue. Streamer thread will pick
> the packets from dataqueue and write on to the pipeline sockets. Also it
> will write the opcodes to tell the DN about the kind of operation.
>  on the data node side we have DataXceiver.c and
> > BlockReceiver.c files
> > which take care of writing these packets in order to a block file
> > until the
> > last packet for the block is received. I want some info around
> > this area
> DataXceiverServer will run and listen for the requests. For every request
> it receives, it will create DataXceiver thread and pass the info to it.
> Based on the opcode it will create BlockReceiver or BlockSender objects and
> give the control to it.
> > where in BlockReceiver.c , i have seen a PacketResponder class and a
> > BlockReceiver class where in two places you are finalizing the
> > block (What
> > i understood by finalizing is that when the last packet for the
> > block is
> > received, you are closing the block file). In PacketResponder
> > class in two
> > places you are using finalizeBlock() function, one in
> > lastDataNodeRun()function and the other in run() method and in
> > BlockReceiver.c you are using
> > finalizeBlock() in receiveBlock() function. I understood from the
> > commentsthat the finalizeBlock() call from run() method is done
> > for the datanode
> > with which client directly interacts and finalizeBlock() call from
> > receiveBlock() function is done for all the datanodes where the
> > block is
> > sent for replication.
>  As part replication, if one block has received by DN and also block
> length will be know before itself. So, receivePacket() invocation in while
> loop itself can read the complete block. So, after reading, it need to
> finalize the block to add into volumesMap.
>  But i didn't understand why there is a
> > finalizeBlock() call from lastDataNodeRun() function.
> This call will be for current writes from client/DN, it will not know the
> actual size untill client says that is last packet in current block.
> finalizeBlock will be called if the packet is lastPacket for that block.
> finalizeBlock will add the replica into volumesMap. Also if the packet is
> last one, then it needs to close all the blocks files in DN which were
> opened for writes.
> > Can someone explain me about this? I may be wrong at most of the
> > places of
> > my understanding of the workflow. Correct me if i am wrong.
> >
> > Thanks,
> > Kartheek
> >
> Regards,
> Uma

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message