Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B96009756 for ; Fri, 4 Nov 2011 15:30:27 +0000 (UTC) Received: (qmail 57140 invoked by uid 500); 4 Nov 2011 15:30:24 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 57089 invoked by uid 500); 4 Nov 2011 15:30:24 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 57081 invoked by uid 99); 4 Nov 2011 15:30:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Nov 2011 15:30:24 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 98.139.253.105 is neither permitted nor denied by domain of evans@yahoo-inc.com) Received: from [98.139.253.105] (HELO mrout2-b.corp.bf1.yahoo.com) (98.139.253.105) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Nov 2011 15:30:17 +0000 Received: from sp1-ex07cas01.ds.corp.yahoo.com (sp1-ex07cas01.ds.corp.yahoo.com [216.252.116.137]) by mrout2-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id pA4FTpGG098022 for ; Fri, 4 Nov 2011 08:29:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1320420591; bh=cxpTsOZdCzF6KsqOFu8rsi1pR+c8mjcNxdIF8OqZQeM=; h=From:To:Date:Subject:Message-ID:In-Reply-To:Content-Type: MIME-Version; b=hE5YU6bD7TeEuH1twMYkEH2Q2HfpKlk0pmodmTqosCcOnrEopS9LrrwvzAOov/xb/ A7n9AjXX/HdZ/Nc5NZy29MyXKffQYpfIlUwOzPyvWIHR+IZl+FZGZYVaaXTzOGMgNH TddGnKjamvbN859UtnLB6Vl8ugTsvaF5i2JQW2no= Received: from SP1-EX07VS02.ds.corp.yahoo.com ([216.252.116.135]) by sp1-ex07cas01.ds.corp.yahoo.com ([216.252.116.137]) with mapi; Fri, 4 Nov 2011 08:29:51 -0700 From: Robert Evans To: "common-user@hadoop.apache.org" Date: Fri, 4 Nov 2011 08:29:48 -0700 Subject: Re: mapred.map.tasks getting set, but not sure where Thread-Topic: mapred.map.tasks getting set, but not sure where Thread-Index: AcybBGW8IGjywbGyQtKN1y/faqZrSAAAjCHj Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CAD96F1C2E874evansyahooinccom_" MIME-Version: 1.0 --_000_CAD96F1C2E874evansyahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable In 0.20.2 The JobClient will update mapred.map.tasks to be equal to the num= ber of splits returned by the InputFormat. The input format will usually t= ake mapred.map.tasks as a recommendation when deciding on what splits to ma= ke. That is the only place in the code that I could find that is setting t= he value and could have any impact on the number of mappers launched. It c= ould be that Someone changed the number of files that are being read in as = input, or that the block size of the files being read in is now different. = It could also be that someone started compressing the input files, so now = they can not be split. If the number of mappers is different it probably m= eans that the input is different some how. --Bobby Evans On 11/4/11 10:12 AM, "Brendan W." wrote: All the same, no change in that...0.20.2. Other people do have access to this system to change things like conf files, but nobody's owning up and I have to figure this out. I have verified that the mapred.map.tasks property is not getting set in the mapred-site.xml files on the cluster or in the job. Just out of other ideas about where it might be getting set... Thanks, Brendan On Fri, Nov 4, 2011 at 11:04 AM, Robert Evans wrote: > What versions of Hadoop were you running with previously, and what versio= n > are you running with now? > > --Bobby Evans > > On 11/4/11 9:33 AM, "Brendan W." wrote: > > Hi, > > In the jobs running on my cluster of 20 machines, I used to run jobs (via > "hadoop jar ...") that would spawn around 4000 map tasks. Now when I run > the same jobs, that number is 20; and I notice that in the job > configuration, the parameter mapred.map.tasks is set to 20, whereas it > never used to be present at all in the configuration file. > > Changing the input split size in the job doesn't affect this--I get the > size split I ask for, but the *number* of input splits is still capped at > 20--i.e., the job isn't reading all of my data. > > The mystery to me is where this parameter could be getting set. It is no= t > present in the mapred-site.xml file in /conf on any machine = in > the cluster, and it is not being set in the job (I'm running out of the > same jar I always did; no updates). > > Is there *anywhere* else this parameter could possibly be getting set? > I've stopped and restarted map-reduce on the cluster with no effect...it'= s > getting re-read in from somewhere, but I can't figure out where. > > Thanks a lot, > > Brendan > > --_000_CAD96F1C2E874evansyahooinccom_--