hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: IO pipeline optimizations
Date Tue, 19 Jul 2011 20:31:05 GMT
Hi Shrinivas,

There has been some work going on recently around optimizing checksums. See
HDFS-2080 for example. This will help both the write and read code, though
we've focused more on read.

There have also been a lot of improvements around random read access - for
example HDFS-941 which improves random read by more than 2x.

I'm planning on writing a blog post in the next couple of weeks about some
of this work.


On Tue, Jul 19, 2011 at 1:26 PM, Shrinivas Joshi <jshrinivas@gmail.com>wrote:

> This blog post on YDN website
> http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
> detailed discussion on different steps involved in Hadoop IO
> operations
> and opportunities for optimizations. Could someone please comment on
> current
> state of these potential optimizations? Are some of these expected to be
> addressed in "next gen MR" release?
> Thanks,
> -Shrinivas

Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message