hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elton sky <eltonsky9...@gmail.com>
Subject Re: HDFS: buffer before contacts Namenode?
Date Wed, 11 Aug 2010 05:48:48 GMT
thanks Hairong.

And Then I wonder the reason of this design change.
Buffer a block worth data before start transfer on client side will increase
network throughput. But that also uses more memory. I think map reduce job,
in most cases, is more memory-bound, like for shuffle phase, rather than
network-bound. Is this the reason?

On Wed, Aug 11, 2010 at 2:55 AM, Hairong Kuang <kuang.hairong@gmail.com>wrote:

>  DataNode only buffers a packet before it contacts NameNode for allocating
> DataNodes to place the block. The doc you read might be too old.
> Hairong
> On 8/9/10 7:14 PM, "elton sky" <eltonsky9404@gmail.com> wrote:
> hello folks,
> I can see from the design doc of HDFS, says: client will buffer a block
> size worth of data before contacting namenode for data node info. This is a
> network throughput optimal way.
> However, I could not find this buffer processing procedure in source code.
> In DFSClient.DataStreamer, it waits for dataqueue to be not empty and
> starts to request namenode and build a pipeline. The number of packets in
> the dataqueue is always 1 when this happens!
> I am confused here. Can anyone address this if I am wrong?

View raw message