Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of zjffdu@gmail.com designates
 74.125.83.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=a2vhszzZ6JLFEPyej3RtSou3JChF4Da3svVmVviSrwuqCUKNyA8cG2/gRij+xKzrNW
         faBb2EXmKy0TS1FG+Do7bLSCK8Sg4GpYbqTkWQdi3kjSgVDPPF+HAGj9XV6+6LawnilJ
         4D5+bd7u+5FlaJ8dTBqXOiMJSotBSyTYhvxZQ=
MIME-Version: 1.0
In-Reply-To: <AANLkTimlgHLtm99l5zm5kV93ntrcgbGbbMRB_v9d78hl@mail.gmail.com>
References: <AANLkTinVGEFPxmsKy5NDB3yEsamexwRK97i6mF3BAlyW@mail.gmail.com>
	<46A377B1A3A3074D8B989BF96663C10DF78EEA4A97@EGL-EX07VS01.ds.corp.yahoo.com>
	<AANLkTimlgHLtm99l5zm5kV93ntrcgbGbbMRB_v9d78hl@mail.gmail.com>
Date: Wed, 23 Jun 2010 18:08:44 +0800
Message-ID: <AANLkTikHjz7-xCjRSDsAyU0lWom2GCpcgWA2XmqEzUKx@mail.gmail.com>
Subject: Re: Questions about recommendation value of the "io.sort.mb"
	parameter
From: Jeff Zhang <zjffdu@gmail.com>
To: common-dev@hadoop.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi =E6=9D=8E=E9=92=B0

The size of map output depends on your Mapper class. The Mapper class
will do processing on the input data.


2010/6/23 =E6=9D=8E=E9=92=B0 <carp84@gmail.com>:
> Hi Sriguru,
>
> Thanks a lot for your comments and suggestions!
> Here I still have some questions: since map mainly do data preparation,
> say split input data into KVPs, sort and partition before spill, would th=
e
> size of map output KVPs be much larger than the input data size? If not,
> since one map task deals with one input split, and one input split is
> usually 64M, the map KVPs size would be proximately 64M. Could you please
> give me some example on map output much larger than the input split? It
> really confuse me for some time, thanks.
>
> Others,
>
> Also badly need your help if you know about this, thanks.
>
> Best Regards,
> Carp
>
> =E5=9C=A8 2010=E5=B9=B46=E6=9C=8823=E6=97=A5 =E4=B8=8B=E5=8D=885:11=EF=BC=
=8CSrigurunath Chakravarthi <sriguru@yahoo-inc.com>=E5=86=99=E9=81=93=EF=BC=
=9A
>
>> Hi Carp,
>> =C2=A0Your assumption is right that this is a per-map-task setting.
>> However, this buffer stores map output KVPs, not input. Therefore the
>> optimal value depends on how much data your map task is generating.
>>
>> If your output per map is greater than io.sort.mb, these rules of thumb
>> that could work for you:
>>
>> 1) Increase max heap of map tasks to use RAM better, but not hit swap.
>> 2) Set io.sort.mb to ~70% of heap.
>>
>> Overall, causing extra "spills" (because of insufficient io.sort.mb) is
>> much better than risking swapping (by setting io.sort.mb and heap too
>> large), in terms of relative performance penalty you will pay.
>>
>> Cheers,
>> Sriguru
>>
>> >-----Original Message-----
>> >From: =E6=9D=8E=E9=92=B0 [mailto:carp84@gmail.com]
>> >Sent: Wednesday, June 23, 2010 12:27 PM
>> >To: common-dev@hadoop.apache.org
>> >Subject: Questions about recommendation value of the "io.sort.mb"
>> >parameter
>> >
>> >Dear all,
>> >
>> >Here I've got a question about the "io.sort.mb" parameter. We can find
>> >material from Yahoo! or Cloudera which recommend setting this value to
>> >200
>> >if the job scale is large, but I'm confused about this. As I know,
>> >the tasktracker will launch a child-JVM for each task, and
>> >=E2=80=9C*io.sort.mb*=E2=80=9D
>> >presents the buffer size in memory inside *one map task child-JVM*, the
>> >default value 100MB should be large enough because the input split of
>> >one
>> >map task is usually 64MB, as large as the block size we usually set.
>> >Then
>> >why the recommendation of =E2=80=9C*io.sort.mb*=E2=80=9D is 200MB for l=
arge jobs (and
>> >it
>> >really works)? How could the job size affect the procedure?
>> >Is there any fault here of my understanding? Any comment/suggestion
>> >will be
>> >highly valued, thanks in advance.
>> >
>> >Best Regards,
>> >Carp
>>
>


--=20
Best Regards

Jeff Zhang