hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uma Maheswara Rao G 72686 <mahesw...@huawei.com>
Subject Re: Packets->Block
Date Thu, 03 Nov 2011 07:25:24 GMT
----- Original Message -----
From: kartheek muthyala <kartheek0274@gmail.com>
Date: Thursday, November 3, 2011 11:23 am
Subject: Packets->Block
To: common-user@hadoop.apache.org

> Hi all,
> I need some info related to the code section which handles the 
> followingoperations.
> Basically DataXceiver.c on the client side  transmits the block in 
> packetsand
Actually DataXceiver will run only in DN. Whenever you create a file DataStreamer thread will
start in DFSClient. Whenever application writing the bytes, they will be enqueued into dataQueue.
Streamer thread will pick the packets from dataqueue and write on to the pipeline sockets.
Also it will write the opcodes to tell the DN about the kind of operation.
 on the data node side we have DataXceiver.c and 
> BlockReceiver.c files
> which take care of writing these packets in order to a block file 
> until the
> last packet for the block is received. I want some info around 
> this area
DataXceiverServer will run and listen for the requests. For every request it receives, it
will create DataXceiver thread and pass the info to it. Based on the opcode it will create
BlockReceiver or BlockSender objects and give the control to it.
> where in BlockReceiver.c , i have seen a PacketResponder class and a
> BlockReceiver class where in two places you are finalizing the 
> block (What
> i understood by finalizing is that when the last packet for the 
> block is
> received, you are closing the block file). In PacketResponder 
> class in two
> places you are using finalizeBlock() function, one in 
> lastDataNodeRun()function and the other in run() method and in 
> BlockReceiver.c you are using
> finalizeBlock() in receiveBlock() function. I understood from the 
> commentsthat the finalizeBlock() call from run() method is done 
> for the datanode
> with which client directly interacts and finalizeBlock() call from
> receiveBlock() function is done for all the datanodes where the 
> block is
> sent for replication.
 As part replication, if one block has received by DN and also block length will be know before
itself. So, receivePacket() invocation in while loop itself can read the complete block. So,
after reading, it need to finalize the block to add into volumesMap.
 But i didn't understand why there is a
> finalizeBlock() call from lastDataNodeRun() function.
This call will be for current writes from client/DN, it will not know the actual size untill
client says that is last packet in current block.
finalizeBlock will be called if the packet is lastPacket for that block.
finalizeBlock will add the replica into volumesMap. Also if the packet is last one, then it
needs to close all the blocks files in DN which were opened for writes.
> Can someone explain me about this? I may be wrong at most of the 
> places of
> my understanding of the workflow. Correct me if i am wrong.
> Thanks,
> Kartheek


View raw message