Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A6CE410544 for ; Wed, 18 Dec 2013 21:16:19 +0000 (UTC) Received: (qmail 45546 invoked by uid 500); 18 Dec 2013 21:16:14 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 45329 invoked by uid 500); 18 Dec 2013 21:16:13 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 45322 invoked by uid 99); 18 Dec 2013 21:16:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Dec 2013 21:16:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kawa.adam@gmail.com designates 209.85.223.175 as permitted sender) Received: from [209.85.223.175] (HELO mail-ie0-f175.google.com) (209.85.223.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Dec 2013 21:16:07 +0000 Received: by mail-ie0-f175.google.com with SMTP id x13so304627ief.34 for ; Wed, 18 Dec 2013 13:15:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=zHvY5VRfWdeR4qy8B/p7pES0Wn8JruInGhNHEZPTWKE=; b=flJYAMUQMsuP/a9n3llPe+yYLYbbnUCShoJODUoWDldU3KdIL6Y8jXBNwkPsaeDVTJ fHv4lkY1wOcO+HoUJFmrtxXPDVCdgP9IOGOHUmcUZiQNZFV9Khjjrjjvar/ppgNHphPT Ut0xUVvGmu7fbx34HYQHsEXfjOmOe5i2M2ShjlRlwQB6XdrfsMxR1DimjUfLuOlaQeQ7 empPrOHrj5aB9M9+P1FmhB7dOA8DQT59gqXHJnt2MxOOZM503kRvzwQyqHfNj3sOxu3z gWVSNn2Nl6XnqMb645hllh80r7pD9M+BSRmX12vOkXPEhSLGFAxvWNlpCxbzGukQqdNQ /m/Q== MIME-Version: 1.0 X-Received: by 10.43.17.196 with SMTP id qd4mr22104618icb.44.1387401346372; Wed, 18 Dec 2013 13:15:46 -0800 (PST) Received: by 10.42.153.136 with HTTP; Wed, 18 Dec 2013 13:15:46 -0800 (PST) In-Reply-To: <1387399538.26004.YahooMailNeo@web121504.mail.ne1.yahoo.com> References: <1387399538.26004.YahooMailNeo@web121504.mail.ne1.yahoo.com> Date: Wed, 18 Dec 2013 22:15:46 +0100 Message-ID: Subject: Re: Hadoop Pi Example in Yarn From: Adam Kawa To: user@hadoop.apache.org, - Content-Type: multipart/alternative; boundary=bcaec5186d4469be0704edd58cc1 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5186d4469be0704edd58cc1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable A map task is created for each input split in your dataset. By default, an input split correlates to block in HDFS i.e. if a file consists of 1 HDFS block, then 1 map task will be started - if a file consists of N blocks, then N map task will be started for that file (obviously, assuming a default settings). PiEstimator generates input files for itself. When you submit PiEstimator job, you need to specify how many map tasks you want to run. Then, before submitting a job to the cluster, it will generate a this number of input files in HDFS. For each file map task will be started. What is interesting each file, will contain a single line only. You can see some code here http://grepcode.com/file/repository.cloudera.com/content/repositories/relea= ses/org.apache.hadoop/hadoop-examples/0.20.2-cdh3u1/org/apache/hadoop/examp= les/PiEstimator.java#PiEstimator 278 //generate an input file for each map task 279 for(int i=3D0; i < numMaps; ++i) { 280 final Path file =3D new Path (inDir, "part"+i); 281 final LongWritable offset =3D new LongWritable (i * numPoints); 282 final LongWritable size =3D new LongWritable (numPoints); 283 final SequenceFile .Writer writer =3D SequenceFile.createWriter ( 284 fs, jobConf, file, 285 LongWritable .class, LongWritable .class, CompressionType .NONE ); 286 try { 287 writer.append (offset, size); 288 } finally { 289 writer.close (); 290 } 291 System .out .println ("Wro= te input for Map #"+i); 292 } 2013/12/18 - > How does the PI example can determine the number of mappers? > I thought the only way to determine number of mappers is via the amount o= f > filesplits you have in the input file... > So for instance if the input size is 100MB and filesplit size is 20MB the= n > I would expect to have 100/20 =3D 5 map tasks. > > Thanks > > --bcaec5186d4469be0704edd58cc1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
A map task is created for each input split in your dataset= . By default, an input split correlates to block in HDFS i.e. if a file con= sists of 1 HDFS block, then 1 map task will be started - if a file consists= of N blocks, then N map task will be started for that file (obviously, ass= uming a default settings).

PiEstimator generates input files for itself. When you submi= t PiEstimator job, you need to specify how many map tasks you want to run. = Then, before submitting a job to the cluster, it will generate a this numbe= r of input files in HDFS. For each file map task will be started. What is i= nteresting each file, will contain a single line only.


=A0=A0=A0=A0=
=A0=A0//generate=A0an=A0i=
nput=A0file=A0for=A0each=A0map=A0task
<= div name=3D"ln" class=3D"" style=3D"display:inline;margin:0px;padding:0px 6= px">
279
=A0=A0=A0=A0=
=A0=A0for(int=A0i=3D0;=A0i=A0<=
=A0numMaps;=A0++i)=A0{
=A0=A0=A0=A0=
=A0=A0=A0=A0final=A0Path=A0file=A0=3D=A0new=A0Path(<=
span class=3D"">inDir,=A0"part"+i);
=A0=A0=A0=A0=
=A0=A0=A0=A0final=A0LongWritable=A0offset=A0=3D=A0new=A0LongWritable(i=A0*=A0numPoints);
=A0=A0=A0=A0=
=A0=A0=A0=A0final=A0LongWritable=A0size=A0=3D=A0new=A0LongWritable(numPoints);
=A0=A0=A0=A0=
=A0=A0=A0=A0final=A0SequenceFile.Writer=A0writer=A0=3D=A0SequenceFile.createWriter(
=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0fs,=A0jobC=
onf,=A0file,
<= div name=3D"ln" class=3D"" style=3D"display:inline;margin:0px;padding:0px 6= px">
285
=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0LongWritable.class,=A0LongWritable.class,=A0CompressionT=
ype.NONE);
=A0=A0=A0=A0=
=A0=A0=A0=A0try=A0{
<= div name=3D"ln" class=3D"" style=3D"display:inline;margin:0px;padding:0px 6= px">
287
=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0writer.append(offset,=A0size);
=A0=A0=A0=A0=
=A0=A0=A0=A0}=A0finally=A0{
<= div name=3D"ln" class=3D"" style=3D"display:inline;margin:0px;padding:0px 6= px">
289
=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0writer.close();
=A0=A0=A0=A0=
=A0=A0=A0=A0}
<= div name=3D"ln" class=3D"" style=3D"display:inline;margin:0px;padding:0px 6= px">
291
=A0=A0=A0=A0=
=A0=A0=A0=A0System.out.println("Wrote=A0input=A0for=A0Map=A0#"+=
i);
=A0=A0=A0=A0=
=A0=A0}


2013/12/18 - <commodore65@ymail.com>
How does the PI example can determine the number o= f mappers?
I thought the only wa= y to determine number of mappers is via the amount of filesplits you have i= n the input file...
So for instance if the input size is 100MB a= nd filesplit size is 20MB then I would expect to have 100/20 =3D 5 map task= s.

Thanks


--bcaec5186d4469be0704edd58cc1--