Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of carp84@gmail.com designates
 209.85.216.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=EW1LescW8yUt8GdsxxUkA3cLKJORW5WAekoriDTIeaIu2wM1nGvdXOlImW8VGK0d33
         azDILYFU6qJCm4J/M6Vieu0ej8x3s1joMV1Uk9XkiIQSDyKeMs6aEZNrEpQYK010bA4L
         W7MgI7PgSS2HPx2azlxKZvXAlzo4zphOVuRYM=
MIME-Version: 1.0
In-Reply-To: <AANLkTimxGauRG30DqsBfrYMclsenvXZB9U6_mx9WvCsX@mail.gmail.com>
References: <AANLkTinVGEFPxmsKy5NDB3yEsamexwRK97i6mF3BAlyW@mail.gmail.com>
	<46A377B1A3A3074D8B989BF96663C10DF78EEA4A97@EGL-EX07VS01.ds.corp.yahoo.com>
	<AANLkTimlgHLtm99l5zm5kV93ntrcgbGbbMRB_v9d78hl@mail.gmail.com>
	<AANLkTikHjz7-xCjRSDsAyU0lWom2GCpcgWA2XmqEzUKx@mail.gmail.com>
	<AANLkTinNxAmv6cf9W5YQJ8LOCeOxjuAJzJpyTIo-dHoC@mail.gmail.com>
	<AANLkTinXsAPCKyV1XXxhsODfUgQ4u_s-0RtOeXd2GNVQ@mail.gmail.com>
	<AANLkTimtGLT52Xu9STAjCxNUf5-cofp43l2__kLLxHc6@mail.gmail.com>
	<AANLkTimxGauRG30DqsBfrYMclsenvXZB9U6_mx9WvCsX@mail.gmail.com>
Date: Sat, 26 Jun 2010 17:18:07 +0800
Message-ID: <AANLkTinTujYj9t00zSI9smYa7EpjNRB5PN9rXJGxBxMi@mail.gmail.com>
Subject: Re: Questions about recommendation value of the "io.sort.mb"
	parameter
From: Yu Li <carp84@gmail.com>
To: common-dev@hadoop.apache.org
Content-Type: multipart/alternative; boundary=00c09f8999069e06a30489eb5d43

--00c09f8999069e06a30489eb5d43
Content-Type: text/plain; charset=GB2312
Content-Transfer-Encoding: quoted-printable

Hi Todd,

Thanks a lot for your detailed explanation and recommendation, it really
helps a lot!

Best Regards,
Carp

2010/6/26 Todd Lipcon <todd@cloudera.com>

> 2010/6/25 Yu Li <carp84@gmail.com>
>
> > Hi Todd,
> >
> > Sorry for bother again, could you further explain what's the 24 bytes
> > additional overhead for each record of map output? What cost the overhe=
ad
> > and what it is for? Thanks a lot.
> >
>
> I actually misremembered, sorry - it's 16 bytes.
>
> In the kvindices buffer:
> 4 bytes for partition ID of each record
> 4 bytes for the key offset in data buffer
> 4 bytes for the value offset in data buffer
>
> In the kvoffsets buffer:
> 4 bytes for an index into the kvindices buffer (this is so that the spill
> sort can just move around indices instead of the entire object)
>
> For more detail, I would recommend reading the code, or looking for Chris
> Douglas's slides from the HUG earlier this year where he gave a very
> informative talk on the evolution of the mapside spill.
>
> -Todd
>
>
> >
> > Best Regards,
> > Carp
> > =D4=DA 2010=C4=EA6=D4=C224=C8=D5 =C9=CF=CE=E71:49=A3=ACTodd Lipcon <tod=
d@cloudera.com>=D0=B4=B5=C0=A3=BA
> >
> > > Plus there's some overhead for each record of map output. Specificall=
y,
> > 24
> > > bytes. So if you output 64MB worth of data, but each of your objects =
is
> > > only
> > > 24 bytes long itself, you need more than 128MB worth of spill space f=
or
> > it.
> > > Last, the map output buffer begins spilling when it is partially full
> so
> > > that more records can be collected while spill proceeds.
> > >
> > > 200MB io.sort.mb has enough headroom for most 64M input splits that
> don't
> > > blow up the data a lot. Expanding much above 200M for most jobs doesn=
't
> > buy
> > > you much. Good news is it's easy to tell by looking at the logs to se=
e
> > how
> > > many times the map tasks are spilling. If you're only spilling once,
> more
> > > io.sort.mb will not help.
> > >
> > > -Todd
> > >
> > > 2010/6/23 =C0=EE=EE=DA <carp84@gmail.com>
> > >
> > > > Hi Jeff,
> > > >
> > > > Thanks for your quick reply. Seems my thinking is stuck on the job
> > style
> > > > I'm
> > > > running. Now I'm much clearer about it.
> > > >
> > > > Best Regards,
> > > > Carp
> > > >
> > > > 2010/6/23 Jeff Zhang <zjffdu@gmail.com>
> > > >
> > > > > Hi =C0=EE=EE=DA
> > > > >
> > > > > The size of map output depends on your Mapper class. The Mapper
> class
> > > > > will do processing on the input data.
> > > > >
> > > > >
> > > > >
> > > > > 2010/6/23 =C0=EE=EE=DA <carp84@gmail.com>:
> > > > >  > Hi Sriguru,
> > > > > >
> > > > > > Thanks a lot for your comments and suggestions!
> > > > > > Here I still have some questions: since map mainly do data
> > > preparation,
> > > > > > say split input data into KVPs, sort and partition before spill=
,
> > > would
> > > > > the
> > > > > > size of map output KVPs be much larger than the input data size=
?
> If
> > > > not,
> > > > > > since one map task deals with one input split, and one input
> split
> > is
> > > > > > usually 64M, the map KVPs size would be proximately 64M. Could
> you
> > > > please
> > > > > > give me some example on map output much larger than the input
> > split?
> > > It
> > > > > > really confuse me for some time, thanks.
> > > > > >
> > > > > > Others,
> > > > > >
> > > > > > Also badly need your help if you know about this, thanks.
> > > > > >
> > > > > > Best Regards,
> > > > > > Carp
> > > > > >
> > > > > > =D4=DA 2010=C4=EA6=D4=C223=C8=D5 =CF=C2=CE=E75:11=A3=ACSrigurun=
ath Chakravarthi <
> > sriguru@yahoo-inc.com
> > > > >=D0=B4=B5=C0=A3=BA
> > > > > >
> > > > > >> Hi Carp,
> > > > > >>  Your assumption is right that this is a per-map-task setting.
> > > > > >> However, this buffer stores map output KVPs, not input.
> Therefore
> > > the
> > > > > >> optimal value depends on how much data your map task is
> > generating.
> > > > > >>
> > > > > >> If your output per map is greater than io.sort.mb, these rules
> of
> > > > thumb
> > > > > >> that could work for you:
> > > > > >>
> > > > > >> 1) Increase max heap of map tasks to use RAM better, but not h=
it
> > > swap.
> > > > > >> 2) Set io.sort.mb to ~70% of heap.
> > > > > >>
> > > > > >> Overall, causing extra "spills" (because of insufficient
> > io.sort.mb)
> > > > is
> > > > > >> much better than risking swapping (by setting io.sort.mb and
> heap
> > > too
> > > > > >> large), in terms of relative performance penalty you will pay.
> > > > > >>
> > > > > >> Cheers,
> > > > > >> Sriguru
> > > > > >>
> > > > > >> >-----Original Message-----
> > > > > >> >From: =C0=EE=EE=DA [mailto:carp84@gmail.com]
> > > > > >> >Sent: Wednesday, June 23, 2010 12:27 PM
> > > > > >> >To: common-dev@hadoop.apache.org
> > > > > >> >Subject: Questions about recommendation value of the
> "io.sort.mb"
> > > > > >> >parameter
> > > > > >> >
> > > > > >> >Dear all,
> > > > > >> >
> > > > > >> >Here I've got a question about the "io.sort.mb" parameter. We
> can
> > > > find
> > > > > >> >material from Yahoo! or Cloudera which recommend setting this
> > value
> > > > to
> > > > > >> >200
> > > > > >> >if the job scale is large, but I'm confused about this. As I
> > know,
> > > > > >> >the tasktracker will launch a child-JVM for each task, and
> > > > > >> >=A1=B0*io.sort.mb*=A1=B1
> > > > > >> >presents the buffer size in memory inside *one map task
> > child-JVM*,
> > > > the
> > > > > >> >default value 100MB should be large enough because the input
> > split
> > > of
> > > > > >> >one
> > > > > >> >map task is usually 64MB, as large as the block size we usual=
ly
> > > set.
> > > > > >> >Then
> > > > > >> >why the recommendation of =A1=B0*io.sort.mb*=A1=B1 is 200MB f=
or large
> jobs
> > > (and
> > > > > >> >it
> > > > > >> >really works)? How could the job size affect the procedure?
> > > > > >> >Is there any fault here of my understanding? Any
> > comment/suggestion
> > > > > >> >will be
> > > > > >> >highly valued, thanks in advance.
> > > > > >> >
> > > > > >> >Best Regards,
> > > > > >> >Carp
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Jeff Zhang
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
>
>
>
> --
>  Todd Lipcon
> Software Engineer, Cloudera
>

--00c09f8999069e06a30489eb5d43--