hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "姚吉龙" <geelong...@gmail.com>
Subject Re: Map‘s number with NLineInputFormat
Date Sat, 20 Apr 2013 00:39:02 GMT
The num of map is decided by the block size and your rawdata 
—
Sent from Mailbox for iPhone

On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yypvsxf19870706@gmail.com>
wrote:

> Hi All
>  I  take NLineInputFormat  as the Text Input Format with the following code
> :
>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>  My input file contains 1000 rows,so I thought it will distribute
> 100(1000/10) maps.However I got 4 maps.
>   I'm confued by the number of Map that was distributed according to the
> running log[1].
>  How it distribute  maps when using NLineInputFormat
> Regards
> [1]=======================================================
> ....
> ....
> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
> mode : false
> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000001_0' done.
> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000001_0
> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(213)) - Starting task:
> attempt_local_0001_m_000002_0
> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
> mapreduce.task.io.sort.mb: 100
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
> soft limit at 83886080
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
> bufstart = 0; bufvoid = 104857600
> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
> kvstart = 26214396; length = 6553600
> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
> Starting flush of map output
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
> Spilling map output
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
> bufstart = 0; bufend = 336; bufvoid = 104857600
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
> 189/6553600
> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000002_0' done.
> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000002_0
> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(213)) - Starting task:
> attempt_local_0001_m_000003_0
> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
> mapreduce.task.io.sort.mb: 100
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
> soft limit at 83886080
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
> bufstart = 0; bufvoid = 104857600
> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
> kvstart = 26214396; length = 6553600
> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
> Starting flush of map output
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
> Spilling map output
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
> bufstart = 0; bufend = 329; bufvoid = 104857600
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
> 185/6553600
> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000003_0' done.
> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000003_0
> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(394)) - Map task executor complete.
> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
> Merging 4 sorted segments
> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down
> to the last merge-pass, with 4 segments left of total size: 8532 bytes
> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,807 WARN  conf.Configuration
> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
> deprecated. Instead, use mapreduce.job.skiprecords
> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
> attempt_local_0001_r_000000_0 is allowed to commit now
> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
> 'attempt_local_0001_r_000000_0' to
> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_r_000000_0' done.
> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
> successfully
> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
> File System Counters
> FILE: Number of bytes read=483553
> FILE: Number of bytes written=1313962
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=296769
> HDFS: Number of bytes written=284
> HDFS: Number of read operations=66
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=8
> Map-Reduce Framework
> Map input records=1000
> Map output records=1000
> Map output bytes=6543
> Map output materialized bytes=8567
> Input split bytes=516
> Combine input records=0
> Combine output records=0
> Reduce input groups=12
> Reduce shuffle bytes=0
> Reduce input records=1000
> Reduce output records=0
> Spilled Records=2000
> Shuffled Maps =0
> Failed Shuffles=0
> Merged Map outputs=0
> GC time elapsed (ms)=7
> CPU time spent (ms)=0
> Physical memory (bytes) snapshot=0
> Virtual memory (bytes) snapshot=0
> Total committed heap usage (bytes)=1773993984
> File Input Format Counters
> Bytes Read=68723
> File Output Format Counters
> Bytes Written=0
Mime
View raw message