hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Questions about recommendation value of the "io.sort.mb" parameter
Date Sat, 26 Jun 2010 05:41:17 GMT
2010/6/25 Yu Li <carp84@gmail.com>

> Hi Todd,
>
> Sorry for bother again, could you further explain what's the 24 bytes
> additional overhead for each record of map output? What cost the overhead
> and what it is for? Thanks a lot.
>

I actually misremembered, sorry - it's 16 bytes.

In the kvindices buffer:
4 bytes for partition ID of each record
4 bytes for the key offset in data buffer
4 bytes for the value offset in data buffer

In the kvoffsets buffer:
4 bytes for an index into the kvindices buffer (this is so that the spill
sort can just move around indices instead of the entire object)

For more detail, I would recommend reading the code, or looking for Chris
Douglas's slides from the HUG earlier this year where he gave a very
informative talk on the evolution of the mapside spill.

-Todd


>
> Best Regards,
> Carp
> 在 2010年6月24日 上午1:49,Todd Lipcon <todd@cloudera.com>写道:
>
> > Plus there's some overhead for each record of map output. Specifically,
> 24
> > bytes. So if you output 64MB worth of data, but each of your objects is
> > only
> > 24 bytes long itself, you need more than 128MB worth of spill space for
> it.
> > Last, the map output buffer begins spilling when it is partially full so
> > that more records can be collected while spill proceeds.
> >
> > 200MB io.sort.mb has enough headroom for most 64M input splits that don't
> > blow up the data a lot. Expanding much above 200M for most jobs doesn't
> buy
> > you much. Good news is it's easy to tell by looking at the logs to see
> how
> > many times the map tasks are spilling. If you're only spilling once, more
> > io.sort.mb will not help.
> >
> > -Todd
> >
> > 2010/6/23 李钰 <carp84@gmail.com>
> >
> > > Hi Jeff,
> > >
> > > Thanks for your quick reply. Seems my thinking is stuck on the job
> style
> > > I'm
> > > running. Now I'm much clearer about it.
> > >
> > > Best Regards,
> > > Carp
> > >
> > > 2010/6/23 Jeff Zhang <zjffdu@gmail.com>
> > >
> > > > Hi 李钰
> > > >
> > > > The size of map output depends on your Mapper class. The Mapper class
> > > > will do processing on the input data.
> > > >
> > > >
> > > >
> > > > 2010/6/23 李钰 <carp84@gmail.com>:
> > > >  > Hi Sriguru,
> > > > >
> > > > > Thanks a lot for your comments and suggestions!
> > > > > Here I still have some questions: since map mainly do data
> > preparation,
> > > > > say split input data into KVPs, sort and partition before spill,
> > would
> > > > the
> > > > > size of map output KVPs be much larger than the input data size?
If
> > > not,
> > > > > since one map task deals with one input split, and one input split
> is
> > > > > usually 64M, the map KVPs size would be proximately 64M. Could you
> > > please
> > > > > give me some example on map output much larger than the input
> split?
> > It
> > > > > really confuse me for some time, thanks.
> > > > >
> > > > > Others,
> > > > >
> > > > > Also badly need your help if you know about this, thanks.
> > > > >
> > > > > Best Regards,
> > > > > Carp
> > > > >
> > > > > 在 2010年6月23日 下午5:11,Srigurunath Chakravarthi <
> sriguru@yahoo-inc.com
> > > >写道:
> > > > >
> > > > >> Hi Carp,
> > > > >>  Your assumption is right that this is a per-map-task setting.
> > > > >> However, this buffer stores map output KVPs, not input. Therefore
> > the
> > > > >> optimal value depends on how much data your map task is
> generating.
> > > > >>
> > > > >> If your output per map is greater than io.sort.mb, these rules
of
> > > thumb
> > > > >> that could work for you:
> > > > >>
> > > > >> 1) Increase max heap of map tasks to use RAM better, but not
hit
> > swap.
> > > > >> 2) Set io.sort.mb to ~70% of heap.
> > > > >>
> > > > >> Overall, causing extra "spills" (because of insufficient
> io.sort.mb)
> > > is
> > > > >> much better than risking swapping (by setting io.sort.mb and
heap
> > too
> > > > >> large), in terms of relative performance penalty you will pay.
> > > > >>
> > > > >> Cheers,
> > > > >> Sriguru
> > > > >>
> > > > >> >-----Original Message-----
> > > > >> >From: 李钰 [mailto:carp84@gmail.com]
> > > > >> >Sent: Wednesday, June 23, 2010 12:27 PM
> > > > >> >To: common-dev@hadoop.apache.org
> > > > >> >Subject: Questions about recommendation value of the "io.sort.mb"
> > > > >> >parameter
> > > > >> >
> > > > >> >Dear all,
> > > > >> >
> > > > >> >Here I've got a question about the "io.sort.mb" parameter.
We can
> > > find
> > > > >> >material from Yahoo! or Cloudera which recommend setting
this
> value
> > > to
> > > > >> >200
> > > > >> >if the job scale is large, but I'm confused about this. As
I
> know,
> > > > >> >the tasktracker will launch a child-JVM for each task, and
> > > > >> >“*io.sort.mb*”
> > > > >> >presents the buffer size in memory inside *one map task
> child-JVM*,
> > > the
> > > > >> >default value 100MB should be large enough because the input
> split
> > of
> > > > >> >one
> > > > >> >map task is usually 64MB, as large as the block size we usually
> > set.
> > > > >> >Then
> > > > >> >why the recommendation of “*io.sort.mb*” is 200MB for
large jobs
> > (and
> > > > >> >it
> > > > >> >really works)? How could the job size affect the procedure?
> > > > >> >Is there any fault here of my understanding? Any
> comment/suggestion
> > > > >> >will be
> > > > >> >highly valued, thanks in advance.
> > > > >> >
> > > > >> >Best Regards,
> > > > >> >Carp
> > > > >>
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message