hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "W.P. McNeill" <bill...@gmail.com>
Subject How do I increase mapper granularity?
Date Tue, 29 Mar 2011 18:18:17 GMT
I'm running a job whose mappers take a long time, which causes problems like
starving out other jobs that want to run on the same cluster.  Rewriting the
mapper algorithm is not currently an option, but I still need a way to
increase the number of mappers so that I will have greater granularity.
 What is the best way to do this?

Looking through the O'Reilly book and starting from
this<http://wiki.apache.org/hadoop/HowManyMapsAndReduces>Wiki page
I've come up with a couple of ideas:

   1. Set mapred.map.tasks to the value I want.
   2. Decrease the block size of my input files.

What are the gotchas with these approaches?  I know that (1) may not work
because this parameter is just a suggestion.  Is there a command line option
that accomplishes (2), or do I have to do a distcp with a non-default block
size.  (I think the answer is that I have to do a distcp, but I'm making

Are there other approaches?  Are there other gotchas that come with trying
to increase mapper granularity.  I know this can be more of an art than a


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message