spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <>
Subject [jira] [Updated] (SPARK-2774) Set preferred locations for reduce tasks
Date Mon, 04 Aug 2014 22:29:12 GMT


Reynold Xin updated SPARK-2774:

    Target Version/s: 1.2.0

> Set preferred locations for reduce tasks
> ----------------------------------------
>                 Key: SPARK-2774
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Shivaram Venkataraman
> Currently we do not set preferred locations for reduce tasks in Spark. This patch proposes
setting preferred locations based on the map output sizes and locations tracked by the MapOutputTracker.
This is useful in two conditions
> 1. When you have a small job in a large cluster it can be useful to co-locate map and
reduce tasks to avoid going over the network
> 2. If there is a lot of data skew in the map stage outputs, then it is beneficial to
place the reducer close to the largest output.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message