spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From andrewor14 <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-9795] Dynamic allocation: avoid double ...
Date Mon, 10 Aug 2015 20:47:01 GMT
GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/8078

    [SPARK-9795] Dynamic allocation: avoid double counting when killing same executor

    This is based on @KaiXinXiaoLei's changes in #7716.
    
    The issue is that when someone calls `sc.killExecutor("1")` on the same executor twice
quickly, then the executor target will be adjusted downwards by 2 instead of 1, even though
we're only actually killing one executor. In certain cases where we don't adjust the target
back upwards quickly, we'll end up with jobs hanging.
    
    This is a common danger because there are many places where this is called:
    - `HeartbeatReceiver` kills an executor that has not been sending heartbeats
    - `ExecutorAllocationManager` kills an executor that has been idle
    - The user code might call this themselves
    
    While it's not clear whether this fixes SPARK-9745, fixing this potential race condition
seems like a strict improvement. I've added a regression test to illustrate the issue.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark da-double-kill

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8078.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8078
    
----
commit fb149da2bb67143fc534d3049faf57959f8a0d49
Author: Andrew Or <andrew@databricks.com>
Date:   2015-08-10T20:39:52Z

    Do not double count when adjusting target downwards

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message