hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2560) Processing multiple input splits per mapper task
Date Sun, 17 Aug 2008 23:28:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623239#action_12623239
] 

eric baldeschwieler commented on HADOOP-2560:
---------------------------------------------

A few observations:

1) You really, really don't want the user configuring the number of tasks to combine.  The
system is hard enough for users to understand without asking them to do this!  Also, if the
default is 1, this will effectively not be used.

2) If you don't allow speculation on combined tasks, then you can really wedge the job, I
don't think this restriction is a good idea.

3) You don't want the node to stall going to the JT for a new task when a task is complete.
 This will defeat a lot of the value of this optimization.  Perhaps the TT can request one
more task than it has slots, in the hope that it can be assigned when the next task finishes?

4) This is all about collation after map tasks.  I think we need to start thinking of that
distinctly from maps themselves.  How can we make this work if we spill on average once per
2.5 maps?  Or 4 times for every 3 maps?  Right now the 4:3 case requires 8 spills, 4 collations
and 4 fetches per reducer.  What if we could turn it into 3 spills, one collation and 1 fetch
per reducer?  That would be a big saving. 


> Processing multiple input splits per mapper task
> ------------------------------------------------
>
>                 Key: HADOOP-2560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2560
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Runping Qi
>
> Currently, an input split contains a consecutive chunk of input file, which by default,
corresponding to a DFS block.
> This may lead to a large number of mapper tasks if the input data is large. This leads
to the following problems:
> 1. Shuffling cost: since the framework has to move M * R map output segments to the nodes
running reducers, 
> larger M means larger shuffling cost.
> 2. High JVM initialization overhead
> 3. Disk fragmentation: larger number of map output files means lower read throughput
for accessing them.
> Ideally, you want to keep the number of mappers to no more than 16 times the number of
 nodes in the cluster.
> To achive that, we can increase the input split size. However, if a split span over more
than one dfs block,
> you lose the data locality scheduling benefits.
> One way to address this problem is to combine multiple input blocks with the same rack
into one split.
> If in average we combine B blocks into one split, then we will reduce the number of mappers
by a factor of B.
> Since all the blocks for one mapper share a rack, thus we can benefit from rack-aware
scheduling.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message