Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B3EC210B6D for ; Wed, 11 Dec 2013 06:21:09 +0000 (UTC) Received: (qmail 96808 invoked by uid 500); 11 Dec 2013 06:20:57 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 96676 invoked by uid 500); 11 Dec 2013 06:20:54 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 96665 invoked by uid 99); 11 Dec 2013 06:20:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 06:20:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of justlooks@gmail.com designates 209.85.216.41 as permitted sender) Received: from [209.85.216.41] (HELO mail-qa0-f41.google.com) (209.85.216.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 06:20:45 +0000 Received: by mail-qa0-f41.google.com with SMTP id j5so4472356qaq.14 for ; Tue, 10 Dec 2013 22:20:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=MTPtKI6kFrxECea8R8ZaQLJb6yyJhvcJ0ta2u9F0yNA=; b=pW4+bBR75VyQ23AXeUtAm6HMHCVAHqvcriv/EP5wlvmi09vdQfqPUIpwddFXSgF0dC JeH9m93REYaWbDhcPAs73E/b/Gbpl0dh9judQbQMWseZLo8hwRWih47HC2sfNoeJUoAE rzCj6bVnfl53CnDVOS6fIZluxMayTzRZPRA47aRL3WPqFDSJNHP/5RrAfiM50a1Si7p4 jl18frqUhiiWomUVJ3bxI4DtB6NhjAzFIRP9j7gEWwyIIPqQ65McfG47CAJl09cgYCKV ElA9rqQrpNUSwLmae6jRCWM5+F2vqcoiuwEP7PJk8rMYwAWZ4vfvpPCYvf+uQC3BQmFH oj0w== MIME-Version: 1.0 X-Received: by 10.224.11.7 with SMTP id r7mr161252289qar.91.1386742824202; Tue, 10 Dec 2013 22:20:24 -0800 (PST) Received: by 10.140.20.37 with HTTP; Tue, 10 Dec 2013 22:20:24 -0800 (PST) In-Reply-To: <5DF48A23D7B14649BBA72C2F64C6663B82B356DB@szxeml523-mbx.china.huawei.com> References: <5DF48A23D7B14649BBA72C2F64C6663B82B356DB@szxeml523-mbx.china.huawei.com> Date: Wed, 11 Dec 2013 14:20:24 +0800 Message-ID: Subject: Re: issue about Shuffled Maps in MR job summary From: ch huang To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c2c1266e786b04ed3c39ab X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2c1266e786b04ed3c39ab Content-Type: text/plain; charset=ISO-8859-1 i read the doc, and find if i have 8 reducer ,a map task will output 8 partition ,each partition will be send to a different reducer, so if i increase reduce number ,the partition number increase ,but the volume on network traffic is same,why sometime ,increase reducer number will not decrease job complete time ? On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B wrote: > It looks simple, J > > > > Shuffled Maps= Number of Map Tasks * Number of Reducers > > > > Thanks and Regards, > > Vinayakumar B > > > > *From:* ch huang [mailto:justlooks@gmail.com] > *Sent:* 11 December 2013 10:56 > *To:* user@hadoop.apache.org > *Subject:* issue about Shuffled Maps in MR job summary > > > > hi,maillist: > > i run terasort with 16 reducers and 8 reducers,when i double > reducer number, the Shuffled maps is also double ,my question is the job > only run 20 map tasks (total input file is 10,and each file is 100M,my > block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers > run and 320 maps in 16 reducers run?how to caculate the shuffle maps number? > > > > 16 reducer summary output: > > > > > > Shuffled Maps =320 > > > > 8 reducer summary output: > > > > Shuffled Maps =160 > --001a11c2c1266e786b04ed3c39ab Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
i read the doc, and find if i have 8 reducer ,a = map task will output 8 partition ,each partition will be=A0send to a differ= ent reducer, so if i increase reduce number ,the partition number increase = ,but the=A0volume on network traffic is same,why sometime ,increase reducer= number will not decrease job=A0complete time=A0?
=A0
On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B <= span dir=3D"ltr"><vinayakumar.b@huawei.com> wrote:

It looks simple, J

=A0

Shuffled Maps=3D Number of Map Tasks * Numbe= r of Reducers

=A0

Thanks and Regards,

Vinayakumar B

=A0

From: ch huang [mailto:justlooks@gmail.com]
Sent: 11 December 2013 10:56
To:
user@hadoop.apache.org
Subject= : issue about Shuffled Maps in MR job summary

<= /div>

=A0

hi,maillist:

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 i run terasort with 1= 6 reducers and 8 reducers,when i=A0double reducer number, the Shuffled maps= is also double ,my question is the job only run 20 map tasks (total input = file is 10,and each file is 100M,my block size is 64M,so split is 20) why i= need shuffle 160 maps in 8 reducers run and 320 maps in 16 reducers run?ho= w to caculate the shuffle maps number?

=A0

16 reducer summary output:

=A0

=A0=A0=A0

=A0Shuffled Maps =3D320=

=A0

8=A0reducer summary output:

=A0

Shuffled Maps =3D160


--001a11c2c1266e786b04ed3c39ab--