ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Veselovsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-3414) Hadoop: Optimize map-reduce job planning.
Date Fri, 08 Jul 2016 13:25:11 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367665#comment-15367665

Ivan Veselovsky commented on IGNITE-3414:

General comments regarding the tasks distribution algorithm.

1) Not quite clear, why the idea of "threshold migration" is used for reducers. Namely, why
the idea of weighted load calculation does not seem to be enough in this case.
Also not quite clear , why logic "add new load to the least loaded machine" is used, while
logic "calculate resultant load for all possible load assignment possibilities, and chose
smallest load among them" seems to be more correct. (We use the latter logic when planning
the mapper tasks).

2) In general, Map-Reduce algorithm suggests that each reducer may require results of all
the Mappers. This way for reducer assignments we should not consider splits affinity. Instead,
we should consider Map tasks assignments: e.g. if 3 of 10 Map tasks assigned to node X, 30%
of reducer input data reside on node X, and this is true for *any* reducer.
Taking this approach, we could simplify reducer assignment algorithm.

3) May be it's worth to think about 3 grades of data transfer load: (1) data transferred between
machines, (2) data transferred from one node to another on the same machine , (3) data not
transferred (produced and used on the same node). Also the load calculation may become much
more accurate if taken proportionally to the summary data length.

> Hadoop: Optimize map-reduce job planning.
> -----------------------------------------
>                 Key: IGNITE-3414
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3414
>             Project: Ignite
>          Issue Type: Task
>          Components: hadoop
>    Affects Versions: 1.6
>            Reporter: Vladimir Ozerov
>            Assignee: Vladimir Ozerov
>            Priority: Critical
>             Fix For: 1.7
> Currently Hadoop module has inefficient map-reduce planning engine. In particular, it
assigns tasks only to affinity nodes. It could lead to situation when very huge tasks is processed
by a single cluster node, while other cluster nodes are idle. 
> We should implement configurable map-reduce planner which will be able to utilize the
whole cluster.

This message was sent by Atlassian JIRA

View raw message