spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-1937) Tasks can be submitted before executors are registered
Date Tue, 24 Jun 2014 18:42:25 GMT

     [ https://issues.apache.org/jira/browse/SPARK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matei Zaharia updated SPARK-1937:
---------------------------------

    Assignee: Rui Li

> Tasks can be submitted before executors are registered
> ------------------------------------------------------
>
>                 Key: SPARK-1937
>                 URL: https://issues.apache.org/jira/browse/SPARK-1937
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Rui Li
>            Assignee: Rui Li
>             Fix For: 1.1.0
>
>         Attachments: After-patch.PNG, Before-patch.png, RSBTest.scala
>
>
> During construction, TaskSetManager will assign tasks to several pending lists according
to the tasks’ preferred locations. If the desired location is unavailable, it’ll then
assign this task to “pendingTasksWithNoPrefs”, a list containing tasks without preferred
locations.
> The problem is that tasks may be submitted before the executors get registered with the
driver, in which case TaskSetManager will assign all the tasks to pendingTasksWithNoPrefs.
Later when it looks for a task to schedule, it will pick one from this list and assign it
to arbitrary executor, since TaskSetManager considers the tasks can run equally well on any
node.
> This problem deprives benefits of data locality, drags the whole job slow and can cause
imbalance between executors.
> I ran into this issue when running a spark program on a 7-node cluster (node6~node12).
The program processes 100GB data.
> Since the data is uploaded to HDFS from node6, this node has a complete copy of the data
and as a result, node6 finishes tasks much faster, which in turn makes it complete dis-proportionally
more tasks than other nodes.
> To solve this issue, I think we shouldn't check availability of executors/hosts when
constructing TaskSetManager. If a task prefers a node, we simply add the task to that node’s
pending list. When later on the node is added, TaskSetManager can schedule the task according
to proper locality level. If unfortunately the preferred node(s) never gets added, TaskSetManager
can still schedule the task at locality level “ANY”.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message