Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C69FDE73 for ; Wed, 13 Mar 2013 20:48:11 +0000 (UTC) Received: (qmail 95807 invoked by uid 500); 13 Mar 2013 20:48:06 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 95704 invoked by uid 500); 13 Mar 2013 20:48:06 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 95693 invoked by uid 99); 13 Mar 2013 20:48:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Mar 2013 20:48:05 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.91.2.13] (HELO smtp-outbound-2.vmware.com) (208.91.2.13) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Mar 2013 20:48:00 +0000 Received: from sc9-mailhost1.vmware.com (sc9-mailhost1.vmware.com [10.113.161.71]) by smtp-outbound-2.vmware.com (Postfix) with ESMTP id 4B1F0289BB for ; Wed, 13 Mar 2013 13:47:40 -0700 (PDT) Received: from zcs-prod-mta-1.vmware.com (zcs-prod-mta-1.vmware.com [10.113.163.63]) by sc9-mailhost1.vmware.com (Postfix) with ESMTP id 48AB618550 for ; Wed, 13 Mar 2013 13:47:40 -0700 (PDT) Received: from zcs-prod-mta-1 (localhost.localdomain [127.0.0.1]) by zcs-prod-mta-1.vmware.com (Postfix) with ESMTP id 37601E1507 for ; Wed, 13 Mar 2013 13:47:40 -0700 (PDT) Received: from zimbra-prod-mbox-8.vmware.com (lbv-sc9-t2prod2-int.vmware.com [10.113.160.246]) by zcs-prod-mta-1.vmware.com (Postfix) with ESMTP for ; Wed, 13 Mar 2013 13:47:40 -0700 (PDT) Date: Wed, 13 Mar 2013 13:47:40 -0700 (PDT) From: Jeffrey Buell To: user@hadoop.apache.org Message-ID: <42273428.14551265.1363207660022.JavaMail.root@vmware.com> In-Reply-To: Subject: Re: Will hadoop always spread the work evenly between nodes? MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_14551264_2102637780.1363207660021" X-Originating-IP: [10.113.160.14] X-Mailer: Zimbra 7.2.0_GA_2681 (ZimbraWebClient - FF3.0 (Win)/7.2.0_GA_2681) X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_14551264_2102637780.1363207660021 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit I think in your case it will have to be even, because all the slots will get filled. A more interesting case is if you have 40 nodes, will you get exactly 5 slots used for each of the nodes? Or will some nodes get more than 5 mappers, and others less? I don't remember the details, but I've had problems with unevenness in such scenarios. At least in MR1, you can usually force evenness by adjusting the number of map and reduce slots per node. In MR2 the slots are combined so achieving evenness will be more difficult. Jeff ----- Original Message ----- From: "jeremy p" To: user@hadoop.apache.org Sent: Wednesday, March 13, 2013 1:01:46 PM Subject: Will hadoop always spread the work evenly between nodes? Say I have 200 input files and 20 nodes, and each node has 10 mapper slots. Will Hadoop always allocate the work evenly, such that each node will get 10 input files and simultaneously start 10 mappers? Is there a way to force this behavior? --Jeremy ------=_Part_14551264_2102637780.1363207660021 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <= div style=3D'font-family: tahoma,new york,times,serif; font-size: 12pt; col= or: #000000'>I think in your case it will have to be even, because all the = slots will get filled.  A more interesting case is if you have 40 node= s, will you get exactly 5 slots used for each of the nodes?  Or will s= ome nodes get more than 5 mappers, and others less?  I don't remember = the details, but I've had problems with unevenness in such scenarios. = At least in MR1, you can usually force evenness by adjusting the number of= map and reduce slots per node.  In MR2 the slots are combined so achi= eving evenness will be more difficult.

Jeff


=
From: "jeremy p" <athomewithagroovebox@gmail.com>
To: user@hado= op.apache.org
Sent: Wednesday, March 13, 2013 1:01:46 PM
Su= bject: Will hadoop always spread the work evenly between nodes?

= Say I have 200 input files and 20 nodes, and each node has 10 mapper slots.=  Will Hadoop always allocate the work evenly, such that each node wil= l get 10 input files and simultaneously start 10 mappers?  Is there a = way to force this behavior?

--Jeremy

------=_Part_14551264_2102637780.1363207660021--