spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj K (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22404) Provide an option to use unmanaged AM in yarn-client mode
Date Tue, 02 Jan 2018 22:14:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308802#comment-16308802
] 

Devaraj K commented on SPARK-22404:
-----------------------------------

Thanks [~irashid] for the comment.

bq. can you provide a little more explanation for the point of this?

An unmanagedAM is an AM that is not launched and managed by the RM. The client creates a new
application on the RM and negotiates a new attempt id. Then it waits for the RM app state
to reach be YarnApplicationState.ACCEPTED after which it spawns the AM in same/another process
and passes it the container id via env variable Environment.CONTAINER_ID. The AM(as part of
same or different process) can register with the RM using the attempt id obtained from the
container id and proceed as normal.

In this PR/JIRA, providing a new configuration "spark.yarn.un-managed-am" (defaults to false)
to enable the Unmanaged AM Application in Yarn Client mode which starts the Application Master
service as part of the Client. It utilizes the existing code for communicating between the
Application Master <-> Task Scheduler for the container requests/allocations/launch,
and eliminates these,
* 	Allocating and launching the Application Master container
* 	Remote Node/Process communication between Application Master <-> Task Scheduler

bq. how much time does this save for you?
It removes the AM container scheduling and launching time, and eliminates the AM acting as
proxy for requesting, launching and removing executors. I can post the comparison results
here with and without unmanaged am.

bq. What's the downside of an unmanaged AM?
Unmanaged AM service would run as part of the Client, Client can handle if anything goes wrong
with the unmanaged AM service unlike relaunching the AM container for failures.

bq. the idea makes sense, but the yarn interaction and client mode is already pretty complicated
so I'd like good justication for this
In this PR, it reuses the most of the existing code for communication between AM <->
Task Scheduler but happens in the same process. The Client starts the AM service in the same
process when the applications state is ACCEPTED and proceeds as usual without disrupting existing
flow.


> Provide an option to use unmanaged AM in yarn-client mode
> ---------------------------------------------------------
>
>                 Key: SPARK-22404
>                 URL: https://issues.apache.org/jira/browse/SPARK-22404
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.2.0
>            Reporter: Devaraj K
>
> There was an issue SPARK-1200 to provide an option but was closed without fixing.
> Using an unmanaged AM in yarn-client mode would allow apps to start up faster, but not
requiring the container launcher AM to be launched on the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message