hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yypvsxf19870706 <yypvsxf19870...@gmail.com>
Subject Re: Map‘s number with NLineInputFormat
Date Sat, 20 Apr 2013 01:18:43 GMT
Hi
   I thought it would be different when adopt the NLineInputFormat
   So here is my conclusion the maps distribution has nothing with the  
NLineInputFormat . The 
NLineInputFormat could decide the number of row to each map, which map has been generated
according to the split.size . 

    An I got the point?


Regards

发自我的 iPhone

在 2013-4-20,8:39,"姚吉龙" <geelongyao@gmail.com> 写道:

> The num of map is decided by the block size and your rawdata 
> 
> ―
> Sent from Mailbox for iPhone
> 
> 
> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yypvsxf19870706@gmail.com> wrote:
> 
>> Hi All
>>    
>>  I  take NLineInputFormat  as the Text Input Format with the following code :
>>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>> 
>>  My input file contains 1000 rows,so I thought it will distribute 100(1000/10) maps.However
I got 4 maps.
>> 
>>   I'm confued by the number of Map that was distributed according to the running
log[1].
>>  How it distribute  maps when using NLineInputFormat
>> 
>> 
>> Regards
>> 
>> 
>> 
>> [1]=======================================================
>> ....
>> ....
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1286)) -
Job job_local_0001 running in uber mode : false
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1293)) -
 map 25% reduce 0%
>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) -
Finished spill 0
>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000001_0
is done. And is in the process of committing
>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501))
- map
>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000001_0'
done.
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238))
- Finishing task: attempt_local_0001_m_000001_0
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213))
- Starting task: attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin
: org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR)
0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb:
100
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) - soft
limit at 83886080
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) - bufstart
= 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) - kvstart
= 26214396; length = 6553600
>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501))
- 
>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) - Starting
flush of map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) - Spilling
map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) - bufstart
= 0; bufend = 336; bufvoid = 104857600
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) - kvstart
= 26214396(104857584); kvend = 26214208(104856832); length = 189/6553600
>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) -
Finished spill 0
>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000002_0
is done. And is in the process of committing
>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501))
- map
>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000002_0'
done.
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238))
- Finishing task: attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213))
- Starting task: attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin
: org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR)
0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb:
100
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) - soft
limit at 83886080
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) - bufstart
= 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) - kvstart
= 26214396; length = 6553600
>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501))
- 
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) - Starting
flush of map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) - Spilling
map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) - bufstart
= 0; bufend = 329; bufvoid = 104857600
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) - kvstart
= 26214396(104857584); kvend = 26214212(104856848); length = 185/6553600
>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) -
Finished spill 0
>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000003_0
is done. And is in the process of committing
>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501))
- map
>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000003_0'
done.
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238))
- Finishing task: attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(394))
- Map task executor complete.
>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin
: org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) - Merging 4
sorted segments
>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down to the
last merge-pass, with 4 segments left of total size: 8532 bytes
>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501))
- 
>> 2013-04-19 23:56:20,807 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(808))
- mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_r_000000_0
is done. And is in the process of committing
>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501))
- 
>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task attempt_local_0001_r_000000_0
is allowed to commit now
>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter (FileOutputCommitter.java:commitTask(432))
- Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501))
- reduce > reduce
>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_r_000000_0'
done.
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1293)) -
 map 100% reduce 100%
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1304)) -
Job job_local_0001 completed successfully
>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1311)) -
Counters: 32
>> 	File System Counters
>> 		FILE: Number of bytes read=483553
>> 		FILE: Number of bytes written=1313962
>> 		FILE: Number of read operations=0
>> 		FILE: Number of large read operations=0
>> 		FILE: Number of write operations=0
>> 		HDFS: Number of bytes read=296769
>> 		HDFS: Number of bytes written=284
>> 		HDFS: Number of read operations=66
>> 		HDFS: Number of large read operations=0
>> 		HDFS: Number of write operations=8
>> 	Map-Reduce Framework
>> 		Map input records=1000
>> 		Map output records=1000
>> 		Map output bytes=6543
>> 		Map output materialized bytes=8567
>> 		Input split bytes=516
>> 		Combine input records=0
>> 		Combine output records=0
>> 		Reduce input groups=12
>> 		Reduce shuffle bytes=0
>> 		Reduce input records=1000
>> 		Reduce output records=0
>> 		Spilled Records=2000
>> 		Shuffled Maps =0
>> 		Failed Shuffles=0
>> 		Merged Map outputs=0
>> 		GC time elapsed (ms)=7
>> 		CPU time spent (ms)=0
>> 		Physical memory (bytes) snapshot=0
>> 		Virtual memory (bytes) snapshot=0
>> 		Total committed heap usage (bytes)=1773993984
>> 	File Input Format Counters 
>> 		Bytes Read=68723
>> 	File Output Format Counters 
>> 		Bytes Written=0
> 

Mime
View raw message