hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: why does not hdfs read ahead ?
Date Wed, 25 Nov 2009 10:35:25 GMT
Michael Thomas wrote:
> Hey guys,
> During the SC09 exercise, our data transfer tool was using the FUSE 
> interface to HDFS.  As Brian said, we were also reading 16 files in 
> parallel.  This seemed to be the optimal number, beyond which the 
> aggregate read rate did not improve.
> We have worked scheduled to modify our data transfer tool to use the 
> native hadoop java APIs, as well as running some additional tests 
> offline to see if the HDFS-FUSE interface is the bottleneck as we suspect.
> Regards,
> --Mike

Was this all local data?

IN Russ Perry's little paper "High Speed Raster Image Streaming For 
Digital Presses Using the Hadoop File System", he got 4Gb/s over the LAN 
by having a client app deciding which datanode to pull each block from, 
rather than having the NN tell them which node to ask for which block

"Measured stream rates approaching 4Gb/s were achieved which is close to 
the required rate for streaming pages containing rich designs to a 
digital press. This required only a minor extension to the Hadoop client 
to allow file blocks to be read in parallel from the Hadoop data nodes."


View raw message