Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E73DEEF2 for ; Fri, 15 Mar 2013 10:33:49 +0000 (UTC) Received: (qmail 74048 invoked by uid 500); 15 Mar 2013 10:33:44 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 73890 invoked by uid 500); 15 Mar 2013 10:33:43 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 73865 invoked by uid 99); 15 Mar 2013 10:33:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Mar 2013 10:33:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rongzheyi@gmail.com designates 209.85.128.41 as permitted sender) Received: from [209.85.128.41] (HELO mail-qe0-f41.google.com) (209.85.128.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Mar 2013 10:33:37 +0000 Received: by mail-qe0-f41.google.com with SMTP id 6so1800857qeb.14 for ; Fri, 15 Mar 2013 03:33:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=Q0dnhz0ctOvpAfq+cRukQo3P78pfAY3UCDcOTsIMrlM=; b=EQAfecL8/AQaxkWEk/EyAmJDz8gwzKUgNoLw2E+hyJxKjz1dHsHwdZi4R7fQ3GOTYt l7wsC28TjCbWDBXQcBDpTKc5KrZScsVEPhyBbE2nCgSXGXhFzfz1YYcPXQHzjVL/kKEd WFf1T0I7Mvv46zGRgoQZHY09fr4kSnjalXDpnHxhOeXKjjxekvKNDGL01r2aoMB68z/a ECOx80JqAf6zIqAyBALP3My/XFegfGDgwakzDa3N4xd6pZNz81cJ0JPDf/VhNi89vVtW v0tEjiNVAevjP9rP2UZoUwoswujpI17xZON0hG4Rr8zEtUgVLujIbz4Q8BHKJKq+Z/je d0+g== X-Received: by 10.229.102.101 with SMTP id f37mr928575qco.112.1363343596479; Fri, 15 Mar 2013 03:33:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.51.17 with HTTP; Fri, 15 Mar 2013 03:32:55 -0700 (PDT) In-Reply-To: References: <1361612254.36060.YahooMailNeo@web194705.mail.sg3.yahoo.com> <1361623966.2874.YahooMailNeo@web194705.mail.sg3.yahoo.com> <1361706030.51309.YahooMailNeo@web194706.mail.sg3.yahoo.com> <1361706818.42489.YahooMailNeo@web194702.mail.sg3.yahoo.com> <1362390140.71433.YahooMailNeo@web194701.mail.sg3.yahoo.com> <1363250753.42275.YahooMailNeo@web194704.mail.sg3.yahoo.com> <1363253090.10644.YahooMailNeo@web194702.mail.sg3.yahoo.com> From: Zheyi RONG Date: Fri, 15 Mar 2013 11:32:55 +0100 Message-ID: Subject: Re: Increase the number of mappers in PM mode To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=002354471084c674b404d7f42a0f X-Virus-Checked: Checked by ClamAV on apache.org --002354471084c674b404d7f42a0f Content-Type: text/plain; charset=ISO-8859-1 Indeed you cannot explicitly set the number of mappers, but still you can gain some control over it, by setting mapred.max.split.size, or mapred.min.split.size. For example, if you have a file of 10GB (10737418240 B), you would like 10 mappers, then each mapper has to deal with 1GB data. According to "splitsize = max(minimumSize, min(maximumSize, blockSize))", you can set mapred.min.split.size=1073741824 (1GB), i.e. $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs It is well explained in thread: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop. Regards, Zheyi. On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang wrote: > s --002354471084c674b404d7f42a0f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Indeed you cannot explicitly set the number of mappers, but still you can g= ain some control over it, by setting mapred.max.split.size, or mapred.min.s= plit.size.

For example, if you have a file of 10GB (1073= 7418240 B), you would like 10 mappers, then each mapper has to deal with 1G= B data.
According to "splitsize =3D=A0max(minimumSize, min(maximumSize, b= lockSize))", you can set mapred.min.split.size=3D1073741824 (1GB), i.e= . =A0 =A0
$hadoop jar -Dmapred.min.split.size=3D1073741824 yourja= r yourargs

It is well explained in thread:=A0http://s= tackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <= span dir=3D"ltr"><yypvsxf19870706@gmail.com> wrote:
s


--002354471084c674b404d7f42a0f--