hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Woody <justin.wo...@gmail.com>
Subject Re: increasing number of mappers.
Date Wed, 09 Nov 2011 12:53:12 GMT

In this case, it doesn't matter how many mappers you provide in your
job configuration. Hadoop will only give 1 mapper per split. Since
your files are less than 64MB (assuming you're using the default block
size of HDFS), you only have 2 splits. If you really need more
mappers, you need to create smaller input files.

Paragraph 1 under the Map heading on this page explains it as well:


2011/11/9 Radim Kolar <hsn@sendmail.cz>:
> I have 2 input seq files 32MB each. I want to run them on as many mappers as
> possible.
> i appended  -D mapred.max.split.size=1000000 as command line argument to
> job, but there is no difference. Job still runs on 2 mappers.
> How split size works? Is max split size used for reading or writing files?
> it works like this?:  set maxsplitsize, write files and you will get bunch
> of seq files as output. then you will get same number of mappers as input
> files.

View raw message