Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
I am running Hadoop 1.0.3 in Pseudo distributed mode.
When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
dfs.block.size = 67108864
I would like to reduce mapred.map.tasks to see if it improves performance.
I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
Is there a way to reduce mapred.map.tasks ?
Thanks in advance for any assistance !