hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shi Yu <sh...@uchicago.edu>
Subject Re: Total input paths number and output
Date Sat, 02 Oct 2010 18:05:58 GMT
On 2010-10-2 12:01, Harsh J wrote:
> mapred.min.split.size and minimum map tasks properties of Hadoop MR also
> control the splitting of input for map talks.
>
> On Oct 2, 2010 10:28 PM, "Harsh J"<qwertymaniac@gmail.com>  wrote:
>
> Outputs are not dependent on number of inputs, but instead the number of
> reducers (if MapReduce) or number of input splits if just plain Maps.
>
> The number of splits is determined in most cases by the input file sizes and
> the set HDFS block size factor (dfs.block.size) it was created under.
>
>
>    
>> On Oct 2, 2010 10:01 PM, "Shi Yu"<shiyu@uchicago.edu>  wrote:
>>
>> Hi,
>>
>> I am running some cod...
>>      
>    

Hi Harsh,

Thanks for the answer. I understand what you have said. However, I was 
trying to see the effect in experiment. For example, I use the exact 
same input (a 13M file) and try the simple WordCount example. I would 
like to see whether my configuration could change the number appeared in 
the log. The configuration in my main function is as follows:

           JobConf conf = new JobConf(WordCount.class);
           conf.setJobName("wordcount");
           conf.setOutputKeyClass(Text.class);
           conf.setOutputValueClass(IntWritable.class);
           conf.setMapperClass(Map.class);
           conf.setCombinerClass(Reduce.class);
           conf.setReducerClass(Reduce.class);
           conf.setMapOutputKeyClass(Text.class);
           conf.setMapOutputValueClass(IntWritable.class);
           conf.setInputFormat(ZipInputFormat.class);
           conf.setInt("mapred.min.split.size",2);
           conf.setNumMapTasks(3);

In the last two lines (mapred.min.split.size  and setNumMapTasks) I set 
different values, from 2 to 10.  But the log is always

INFO mapred.FileInputFormat: Total input paths to process : 1


Then I change to my real code using the exact same input, I set
       conf.setNumMapTasks(1);
       conf.setNumReduceTasks(1);

The log shows
INFO mapred.FileInputFormat: Total input paths to process : 2

What's wrong? Why I cannot see the direct effect of my settings. The input file is 13M so
it is smaller than the default block size 64M. I leave that block size setting by default.

Thanks.

Best Regards,

Shi
  


Mime
View raw message