hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: questions regarding fuse_dfs_read
Date Wed, 07 Sep 2011 12:15:47 GMT
Hi Aastha,

A read-ahead buffer is a common technique to trade higher bandwidth for lower latency for
a number of common read patterns.  Your OS does something similar (a much more advanced technique
though).  By reading ahead, HDFS is betting that your reads have a pattern to it.  I think
the 10MB default is a touch excessive (made more sense in previous releases).  I use 32KB.

The buffer is not used if you have very large reads, as it doesn't provide any benefit.


On Sep 7, 2011, at 12:45 AM, Aastha Mehta wrote:

> Hello,
> I am using FUSE-DFS with HDFS for a project. I have to modify the read and
> write functions of fuse_dfs. I have few questions regarding the
> fuse_dfs_read code. There is an rdbuffer_size variable associated with the
> dfs_context, which is by default initialized to 10M. What is this
> rdbuffer_size and what is it used for?
> Secondly, in the fuse_dfs_read function, there are two places where
> hdfsPread() is called in a loop. First, there is a check for whether the
> requested read size is greater than the value of rdbuffer_size. Only if it
> is, is the hdfsPread executed. In this case, the data is read into the
> buffer passed from the caller.
> In the second case, hdfsPread is executed for if a valid buffer is
> associated with the dfs file handle fh and the size and offset of read
> request lie within the range of the fh->buf. In this case, the data is read
> into fh->buf.
> Could someone explain what is happening here?
> Thanks,
> Aastha.
> -- 
> Aastha Mehta
> B.E. (Hons.) Computer Science
> BITS Pilani
> E-mail: aasthakm@gmail.com

View raw message