hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruba Borthakur <dhr...@gmail.com>
Subject Re: HDFS Client side buffering
Date Mon, 23 Feb 2009 18:34:26 GMT
Hi Ajit,

Your asusmption is absolutely right. The hdfs client buffers small packets
of data in memory and then sends them to the pipeline of datanode(s). Each
memory buffer packet is typically 64K and the client uses a sliding window
protocol with a max window size of 80 packets. The client does *not* cache
the contents of the entire block in a local disk.

I will update the design document... the description over there is stale.

thanks,
dhruba


On Mon, Feb 23, 2009 at 6:55 AM, Ajit Ratnaparkhi <
ajit.ratnaparkhi@gmail.com> wrote:

> Hi,
>
> I have same doubt.
>
> From the code scan it looks like whenever client writes data, one packet is
> buffered (of size 64 KB) and this packet is directly sent to the
> corresponding datanodes. Whenever a block end is found and new packet of
> new
> block is ready, namenode is contacted to create new block entry and to
> assign datanodes to it, then the new packets are sent to one of these newly
> allocated datanodes.
>
> So it seems that it does not cache entire block locally before contacting
> namenode, as stated in design doc.
>
> can somebody please clarify on this.
>
>
>
>
> On Mon, Feb 23, 2009 at 11:05 AM, Sangmin Lee <sangmin.dev@gmail.com>
> wrote:
>
> > Hi folks,
> >
> > I have a question regarding HDFS' client side buffering.
> > From the documents
> >
> > http://hadoop.apache.org/core/docs/r0.19.0/hdfs_design.html#Staging
> >
> > It states that a HDFS client caches one blocks size before it contacts a
> > namenode for a new block.
> > Is this true?
> > I can't find a part of source code for this operation.
>
>
> The source code of above mentioned description can be found in,
> DFSClient.DFSOutputStream
> Short explaination is given below,
>
> 1. Whenever user writes data by calling FSDataOutputStream.write(...)
> internally DFSClient.DFSOutputStream.writeChunk(...) gets called which
> creates a 'Packet' in its buffer and enqueues it in 'DataQueue' maintained
> by object of DFSOutputStream. (Packet size is 64KB. )
>
> 2. There is a continuously running thread 'DataStreamer'
> (DFSClient.DFSOutputStream.DataStreamer)  which is started when
> DFSOutputStream object is created.
>
> 3. This DataStreamer continuously looks at 'DataQueue', as soon as it finds
> a packet added to the queue, it dequeues that packet and sends it on stream
> connected to the datanode. If end of block is found, it contacts namenode
> (namenode.addblock) and gets new datanode address.
>
>
>
> >
> > Can anyone shed some light on this for me?
> >
> > I appreciate your help.
> >
> > -sangmin
> >
>
>
> thanks,
> - ajit.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message