hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 顾荣 <gurongwal...@gmail.com>
Subject Re: io.sort.mb based on HDFS block size
Date Sat, 16 Apr 2011 16:17:11 GMT
Hi Shrinivas,
 sry for this late reply.

yeah,I can understand what you mean.I also don't  mean the io.sort.mb is
equal to the block size.The point is that the data in buffer are spilled to
HDFS by several times,and each time just spill a little.Before writing to
HDFS ,the spilled data will be combined,if there is a combine
function.So,the data size is really uncertain during the process .From
HDFS's pespective,it can just feel that the data come group by group ,no
idea about the io.sort.mb which is the buffer's total size.

that's why I think setting HDFS block size to config the io.sort.mb is  kind
of meaningless.However this is a very interesting idea.

Regards
Walker Gu


2011/4/15 Shrinivas Joshi <jshrinivas@gmail.com>

> Hi Walker,
>
> Thanks for your feedback. I was actually thinking that io.sort.mb could be
> some factor of block size and not equal to block size. This will avoid
> re-tuning of sort buffer sizes and spill threshold values for different
> HDFS
> block sizes. Am I missing something?
>
> Thanks,
> -Shrinivas
>
> On Thu, Apr 14, 2011 at 10:35 AM, 顾荣 <gurongwalker@gmail.com> wrote:
>
> > Hi Shrinivas,
> >  I also used to think about this.However,the data in buffer are spilled
> > into
> > HDFS when the can reach the threshold,not copy the entire data int to
> > HDFS.And also the data in HDFS may not has the same size as they are in
> > buffer,because
> > if there is a combiner that works they can be shrinked to some degree
> which
> > we are not sure.
> > In one word,the data's finally size are uncertain.so,the this fact to
> > config
> > HDFS block size kind of meaningless.
> >
> > Good Luck
> > Walker Gu.
> >
> >
> > 2011/4/13 Shrinivas Joshi <jshrinivas@gmail.com>
> >
> > > Looking at workloads like TeraSort where intermediate map output is
> > > proportional to HDFS block size, I was wondering whether it would be
> > > beneficial to have a mechanism for setting buffer spaces like
> io.sort.mb
> > to
> > > be a certain factor of HDFS block size? I am sure there are other
> config
> > > parameters that could benefit from such expression type values.
> > >
> > > Please let me know your thoughts on this.
> > >
> > > Thanks,
> > > -Shrinivas
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message