hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 顾荣 <gurongwal...@gmail.com>
Subject Re: io.sort.mb based on HDFS block size
Date Thu, 14 Apr 2011 15:35:08 GMT
Hi Shrinivas,
 I also used to think about this.However,the data in buffer are spilled into
HDFS when the can reach the threshold,not copy the entire data int to
HDFS.And also the data in HDFS may not has the same size as they are in
if there is a combiner that works they can be shrinked to some degree which
we are not sure.
In one word,the data's finally size are uncertain.so,the this fact to config
HDFS block size kind of meaningless.

Good Luck
Walker Gu.

2011/4/13 Shrinivas Joshi <jshrinivas@gmail.com>

> Looking at workloads like TeraSort where intermediate map output is
> proportional to HDFS block size, I was wondering whether it would be
> beneficial to have a mechanism for setting buffer spaces like io.sort.mb to
> be a certain factor of HDFS block size? I am sure there are other config
> parameters that could benefit from such expression type values.
> Please let me know your thoughts on this.
> Thanks,
> -Shrinivas

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message