hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajit Ratnaparkhi <ajit.ratnapar...@gmail.com>
Subject Re: HDFS Client side buffering
Date Mon, 23 Feb 2009 14:55:13 GMT
Hi,

I have same doubt.

>From the code scan it looks like whenever client writes data, one packet is
buffered (of size 64 KB) and this packet is directly sent to the
corresponding datanodes. Whenever a block end is found and new packet of new
block is ready, namenode is contacted to create new block entry and to
assign datanodes to it, then the new packets are sent to one of these newly
allocated datanodes.

So it seems that it does not cache entire block locally before contacting
namenode, as stated in design doc.

can somebody please clarify on this.




On Mon, Feb 23, 2009 at 11:05 AM, Sangmin Lee <sangmin.dev@gmail.com> wrote:

> Hi folks,
>
> I have a question regarding HDFS' client side buffering.
> From the documents
>
> http://hadoop.apache.org/core/docs/r0.19.0/hdfs_design.html#Staging
>
> It states that a HDFS client caches one blocks size before it contacts a
> namenode for a new block.
> Is this true?
> I can't find a part of source code for this operation.


The source code of above mentioned description can be found in,
DFSClient.DFSOutputStream
Short explaination is given below,

1. Whenever user writes data by calling FSDataOutputStream.write(...)
internally DFSClient.DFSOutputStream.writeChunk(...) gets called which
creates a 'Packet' in its buffer and enqueues it in 'DataQueue' maintained
by object of DFSOutputStream. (Packet size is 64KB. )

2. There is a continuously running thread 'DataStreamer'
(DFSClient.DFSOutputStream.DataStreamer)  which is started when
DFSOutputStream object is created.

3. This DataStreamer continuously looks at 'DataQueue', as soon as it finds
a packet added to the queue, it dequeues that packet and sends it on stream
connected to the datanode. If end of block is found, it contacts namenode
(namenode.addblock) and gets new datanode address.



>
> Can anyone shed some light on this for me?
>
> I appreciate your help.
>
> -sangmin
>


thanks,
- ajit.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message