hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tarandeep Singh <tarand...@gmail.com>
Subject Re: Effects of increasing block size / min split size
Date Fri, 12 Jun 2009 16:41:39 GMT
Thanks Jothi...

-Tarandeep

On Fri, Jun 12, 2009 at 4:35 AM, Jothi Padmanabhan <jothipn@yahoo-inc.com>wrote:

> If the number of maps is reduced,  it is possible that the size of
> individual map outputs might increase. A couple of possible issues come to
> mind immediately:
> 1.  Number of spills in the map might be more. This might incur extra cost
> during merging.
> 2. Also, while the reduces might pull in more data per fetch (which is
> good), it might also result in a state where the reducer is not able to
> store the map output in memory but needs to shuffle it to disk.
>
> JVM reuse should help, but if the individual task completion time is very
> high, there might not be any discernible performance gain.
>
> Jothi
>
>
> On 6/11/09 11:36 PM, "Tarandeep Singh" <tarandeep@gmail.com> wrote:
>
> > Hi,
> >
> > I am trying to understand the effects of increasing block size or minimum
> > split size. If I increase them, then a mapper will process more data,
> > effectively reducing the number of mappers that will be spawned. As there
> is
> > an overhead in starting mappers, so this seems good.
> >
> > However, If I increase their values too much, what negative effects will
> > come up? Put in other words, how to compute what is the best number of
> > mappers to start for processing a given size data on a cluster.
> >
> > For calculations, let us assume- 100G of data, 4 machines (dual core).
> >
> > Also if I set the reuse jvm flag to -1, will it make a difference?
> >
> > Thanks,
> > Tarandeep
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message