Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6FF2E101B7 for ; Sun, 21 Apr 2013 04:54:50 +0000 (UTC) Received: (qmail 23156 invoked by uid 500); 21 Apr 2013 04:54:45 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 22938 invoked by uid 500); 21 Apr 2013 04:54:45 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 22914 invoked by uid 99); 21 Apr 2013 04:54:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Apr 2013 04:54:44 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yypvsxf19870706@gmail.com designates 209.85.220.45 as permitted sender) Received: from [209.85.220.45] (HELO mail-pa0-f45.google.com) (209.85.220.45) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Apr 2013 04:54:39 +0000 Received: by mail-pa0-f45.google.com with SMTP id lf10so3126pab.18 for ; Sat, 20 Apr 2013 21:54:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:subject:references:from:content-type:x-mailer :in-reply-to:message-id:date:to:content-transfer-encoding :mime-version; bh=kTC5Ady70ZMbi79Lh/VMhsRT4O7QUK3M4XVC4zaksGI=; b=0IFfboP3t5SU1d+l3e4lS4nafHaipJPm2X+gmt7RN9PsrHT2TCIFqZNVln2BhVMbBB YxunP0Dgj5Zpj0Nn3PSpmLCOSXtKm/vBO2uS5o/FS0TZQnSE/fmbdlStgHHnm/fEvuCm REYLTFn1CBfp3eoFX3bHNgpTBUaSyL+ccVz0eSLBEtKn3SojFKrkkazK/l8qAUFNwGms 3lFL4XWcEdnXWBoH5TUDLRSXoXM9di3loQdbQW5bUu3R4b1iKjHMFrTzX4FV2a87VrVN VMO6qZDfeHy82udCjsTz6TBTPsCZ1PQMhZq0IeTAL69OtTPBdcFrvvJBXrxqeemG2+WR S96g== X-Received: by 10.66.232.230 with SMTP id tr6mr23104433pac.83.1366520059002; Sat, 20 Apr 2013 21:54:19 -0700 (PDT) Received: from [10.11.52.16] ([122.96.47.178]) by mx.google.com with ESMTPS id p5sm4788269pbl.17.2013.04.20.21.54.15 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 20 Apr 2013 21:54:17 -0700 (PDT) Subject: =?GB2312?Q?Re:_Map=A1=AEs_number_with_NLineInputFormat?= References: <1366418342477.e8cd4cf6@Nodemailer> From: yypvsxf19870706 Content-Type: text/plain; charset=GB2312 X-Mailer: iPhone Mail (10B146) In-Reply-To: Message-Id: Date: Sun, 21 Apr 2013 12:52:47 +0800 To: "user@hadoop.apache.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org Hi Harsh Thank you for suggestion . I do miss the expression to set the input form= at . Now, it works . Thanks Regards=20 =B7=A2=D7=D4=CE=D2=B5=C4 iPhone =D4=DA 2013-4-21=A3=AC1:04=A3=ACHarsh J =D0=B4=B5=C0=A3= =BA > Do you also ensure setting your desired input format class via the > setInputFormat*(=A1=AD) API? >=20 > On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706 > wrote: >> Hi >> I thought it would be different when adopt the NLineInputFormat >> So here is my conclusion the maps distribution has nothing with the >> NLineInputFormat . The >> NLineInputFormat could decide the number of row to each map, which map ha= s >> been generated according to the split.size . >>=20 >> An I got the point? >>=20 >>=20 >> Regards >>=20 >> =B7=A2=D7=D4=CE=D2=B5=C4 iPhone >>=20 >> =D4=DA 2013-4-20=A3=AC8:39=A3=AC"=D2=A6=BC=AA=C1=FA" =D0=B4=B5=C0=A3=BA >>=20 >> The num of map is decided by the block size and your rawdata >>=20 >> =A1=AA >> Sent from Mailbox for iPhone >>=20 >>=20 >> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang >> wrote: >>>=20 >>> Hi All >>>=20 >>> I take NLineInputFormat as the Text Input Format with the following >>> code : >>> NLineInputFormat.setNumLinesPerSplit(job, 10); >>> NLineInputFormat.addInputPath(job,new Path(args[0].toString())); >>>=20 >>> My input file contains 1000 rows,so I thought it will distribute >>> 100(1000/10) maps.However I got 4 maps. >>>=20 >>> I'm confued by the number of Map that was distributed according to the >>> running log[1]. >>> How it distribute maps when using NLineInputFormat >>>=20 >>>=20 >>> Regards >>>=20 >>>=20 >>>=20 >>> [1]=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D >>> .... >>> .... >>> 2013-04-19 23:56:20,377 INFO mapreduce.Job >>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber= >>> mode : false >>> 2013-04-19 23:56:20,377 INFO mapreduce.Job >>> (Job.java:monitorAndPrintJob(1293)) - map 25% reduce 0% >>> 2013-04-19 23:56:20,381 INFO mapred.MapTask >>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0 >>> 2013-04-19 23:56:20,384 INFO mapred.Task (Task.java:done(979)) - >>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of >>> committing >>> 2013-04-19 23:56:20,388 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:statusUpdate(501)) - map >>> 2013-04-19 23:56:20,389 INFO mapred.Task (Task.java:sendDone(1099)) - >>> Task 'attempt_local_0001_m_000001_0' done. >>> 2013-04-19 23:56:20,389 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:run(238)) - Finishing task: >>> attempt_local_0001_m_000001_0 >>> 2013-04-19 23:56:20,389 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:run(213)) - Starting task: >>> attempt_local_0001_m_000002_0 >>> 2013-04-19 23:56:20,391 INFO mapred.Task (Task.java:initialize(565)) - >>> Using ResourceCalculatorPlugin : >>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916 >>> 2013-04-19 23:56:20,486 INFO mapred.MapTask >>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584) >>> 2013-04-19 23:56:20,486 INFO mapred.MapTask (MapTask.java:(923)) -= >>> mapreduce.task.io.sort.mb: 100 >>> 2013-04-19 23:56:20,486 INFO mapred.MapTask (MapTask.java:(924)) -= >>> soft limit at 83886080 >>> 2013-04-19 23:56:20,486 INFO mapred.MapTask (MapTask.java:(925)) -= >>> bufstart =3D 0; bufvoid =3D 104857600 >>> 2013-04-19 23:56:20,487 INFO mapred.MapTask (MapTask.java:(926)) -= >>> kvstart =3D 26214396; length =3D 6553600 >>> 2013-04-19 23:56:20,515 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:statusUpdate(501)) - >>> 2013-04-19 23:56:20,515 INFO mapred.MapTask (MapTask.java:flush(1389)) -= >>> Starting flush of map output >>> 2013-04-19 23:56:20,516 INFO mapred.MapTask (MapTask.java:flush(1408)) -= >>> Spilling map output >>> 2013-04-19 23:56:20,516 INFO mapred.MapTask (MapTask.java:flush(1409)) -= >>> bufstart =3D 0; bufend =3D 336; bufvoid =3D 104857600 >>> 2013-04-19 23:56:20,516 INFO mapred.MapTask (MapTask.java:flush(1411)) -= >>> kvstart =3D 26214396(104857584); kvend =3D 26214208(104856832); length =3D= >>> 189/6553600 >>> 2013-04-19 23:56:20,523 INFO mapred.MapTask >>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0 >>> 2013-04-19 23:56:20,552 INFO mapred.Task (Task.java:done(979)) - >>> Task:attempt_local_0001_m_000002_0 is done. And is in the process of >>> committing >>> 2013-04-19 23:56:20,555 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:statusUpdate(501)) - map >>> 2013-04-19 23:56:20,556 INFO mapred.Task (Task.java:sendDone(1099)) - >>> Task 'attempt_local_0001_m_000002_0' done. >>> 2013-04-19 23:56:20,556 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:run(238)) - Finishing task: >>> attempt_local_0001_m_000002_0 >>> 2013-04-19 23:56:20,556 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:run(213)) - Starting task: >>> attempt_local_0001_m_000003_0 >>> 2013-04-19 23:56:20,558 INFO mapred.Task (Task.java:initialize(565)) - >>> Using ResourceCalculatorPlugin : >>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3 >>> 2013-04-19 23:56:20,666 INFO mapred.MapTask >>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584) >>> 2013-04-19 23:56:20,666 INFO mapred.MapTask (MapTask.java:(923)) -= >>> mapreduce.task.io.sort.mb: 100 >>> 2013-04-19 23:56:20,666 INFO mapred.MapTask (MapTask.java:(924)) -= >>> soft limit at 83886080 >>> 2013-04-19 23:56:20,666 INFO mapred.MapTask (MapTask.java:(925)) -= >>> bufstart =3D 0; bufvoid =3D 104857600 >>> 2013-04-19 23:56:20,667 INFO mapred.MapTask (MapTask.java:(926)) -= >>> kvstart =3D 26214396; length =3D 6553600 >>> 2013-04-19 23:56:20,690 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:statusUpdate(501)) - >>> 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1389)) -= >>> Starting flush of map output >>> 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1408)) -= >>> Spilling map output >>> 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1409)) -= >>> bufstart =3D 0; bufend =3D 329; bufvoid =3D 104857600 >>> 2013-04-19 23:56:20,690 INFO mapred.MapTask (MapTask.java:flush(1411)) -= >>> kvstart =3D 26214396(104857584); kvend =3D 26214212(104856848); length =3D= >>> 185/6553600 >>> 2013-04-19 23:56:20,695 INFO mapred.MapTask >>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0 >>> 2013-04-19 23:56:20,697 INFO mapred.Task (Task.java:done(979)) - >>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of >>> committing >>> 2013-04-19 23:56:20,717 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:statusUpdate(501)) - map >>> 2013-04-19 23:56:20,718 INFO mapred.Task (Task.java:sendDone(1099)) - >>> Task 'attempt_local_0001_m_000003_0' done. >>> 2013-04-19 23:56:20,718 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:run(238)) - Finishing task: >>> attempt_local_0001_m_000003_0 >>> 2013-04-19 23:56:20,718 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:run(394)) - Map task executor complete. >>> 2013-04-19 23:56:20,752 INFO mapred.Task (Task.java:initialize(565)) - >>> Using ResourceCalculatorPlugin : >>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d >>> 2013-04-19 23:56:20,760 INFO mapred.Merger (Merger.java:merge(549)) - >>> Merging 4 sorted segments >>> 2013-04-19 23:56:20,767 INFO mapred.Merger (Merger.java:merge(648)) - >>> Down to the last merge-pass, with 4 segments left of total size: 8532 by= tes >>> 2013-04-19 23:56:20,768 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:statusUpdate(501)) - >>> 2013-04-19 23:56:20,807 WARN conf.Configuration >>> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is >>> deprecated. Instead, use mapreduce.job.skiprecords >>> 2013-04-19 23:56:21,129 INFO mapred.Task (Task.java:done(979)) - >>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of >>> committing >>> 2013-04-19 23:56:21,131 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:statusUpdate(501)) - >>> 2013-04-19 23:56:21,131 INFO mapred.Task (Task.java:commit(1140)) - Tas= k >>> attempt_local_0001_r_000000_0 is allowed to commit now >>> 2013-04-19 23:56:21,138 INFO output.FileOutputCommitter >>> (FileOutputCommitter.java:commitTask(432)) - Saved output of task >>> 'attempt_local_0001_r_000000_0' to >>> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r= _000000 >>> 2013-04-19 23:56:21,139 INFO mapred.LocalJobRunner >>> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce >>> 2013-04-19 23:56:21,139 INFO mapred.Task (Task.java:sendDone(1099)) - >>> Task 'attempt_local_0001_r_000000_0' done. >>> 2013-04-19 23:56:21,381 INFO mapreduce.Job >>> (Job.java:monitorAndPrintJob(1293)) - map 100% reduce 100% >>> 2013-04-19 23:56:21,381 INFO mapreduce.Job >>> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed >>> successfully >>> 2013-04-19 23:56:21,427 INFO mapreduce.Job >>> (Job.java:monitorAndPrintJob(1311)) - Counters: 32 >>> File System Counters >>> FILE: Number of bytes read=3D483553 >>> FILE: Number of bytes written=3D1313962 >>> FILE: Number of read operations=3D0 >>> FILE: Number of large read operations=3D0 >>> FILE: Number of write operations=3D0 >>> HDFS: Number of bytes read=3D296769 >>> HDFS: Number of bytes written=3D284 >>> HDFS: Number of read operations=3D66 >>> HDFS: Number of large read operations=3D0 >>> HDFS: Number of write operations=3D8 >>> Map-Reduce Framework >>> Map input records=3D1000 >>> Map output records=3D1000 >>> Map output bytes=3D6543 >>> Map output materialized bytes=3D8567 >>> Input split bytes=3D516 >>> Combine input records=3D0 >>> Combine output records=3D0 >>> Reduce input groups=3D12 >>> Reduce shuffle bytes=3D0 >>> Reduce input records=3D1000 >>> Reduce output records=3D0 >>> Spilled Records=3D2000 >>> Shuffled Maps =3D0 >>> Failed Shuffles=3D0 >>> Merged Map outputs=3D0 >>> GC time elapsed (ms)=3D7 >>> CPU time spent (ms)=3D0 >>> Physical memory (bytes) snapshot=3D0 >>> Virtual memory (bytes) snapshot=3D0 >>> Total committed heap usage (bytes)=3D1773993984 >>> File Input Format Counters >>> Bytes Read=3D68723 >>> File Output Format Counters >>> Bytes Written=3D0 >=20 >=20 >=20 > --=20 > Harsh J