hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@alumni.cmu.edu>
Subject Re: Why do reads take as long as replicated writes?
Date Mon, 10 Nov 2014 18:52:51 GMT
I strongly suggest benchmarking a modern version of Hadoop rather than
Hadoop 1.x.  The native CRC stuff from HDFS-3528 greatly reduces CPU
consumption on the read path.  I wrote about some other read path
optimizations in Hadoop 2.x here:
http://www.club.cc.cmu.edu/~cmccabe/d/2014.04_ApacheCon_HDFS_read_path_optimization_presentation.pdf
. I agree with Andrew that Teragen and Teravalidate are probably a
better choice for you.  Look for the bottleneck in your system.

best,
Colin

On Wed, Nov 5, 2014 at 4:10 PM, Eitan Rosenfeld <eitan27@gmail.com> wrote:
> Daemeon - Indeed, I neglected to mention that I am clearing the caches
> throughout my cluster before running the read benchmark. My expectation
> was to ideally get results that were proportionate to disk I/O, given
> that replicated writes perform twice the disk I/O relative to reads. I've
> verified the I/O with iostat. However, as I mentioned earlier, reads and
> writes converge as the number of files in the workload increases, despite
> the constant ratio of write I/O to read I/O.
>
> Andrew - I've verified that the network is not the bottleneck. (All of the
> links are 10Gb). As you'll see, I suspect that the lack of data-locality
> causes the slowdown because a given node can be responsible for
> serving multiple remote block reads all at once.
>
> I hope my understanding of writes and reads can be confirmed:
>
> Write pipelining allows a node to write, replicate, and receive replicated
> data in parallel. If node A is writing its own data while receiving
> replicated data from node B, node B does not wait for node A to finish
> writing B's replicated data to disk. Rather, node B can begin writing its
> next local block immediately.  Thus, pipelining helps replicated writes
> have good performance.
>
> In contrast, let's assume node A is currently reading a block. If node A
> receives an additional read request from node B, A will take longer to
> serve the block to B because of A's pre-existing read. Because node B
> waits longer for the block to be served from A, there is a delay on node B
> before it attempts to read the next block in the file. Multiple read
> requests from different nodes are a consequence of having no built-in
> data locality with TestDFSIO. Finally, as the number of concurrent tasks
> throughout the cluster increases, the wait time for reads increases.
>
> Is my understanding of these read and write mechanisms correct?
>
> Thank you,
> Eitan

Mime
View raw message