hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naganarasimha Garla <naganarasimha...@apache.org>
Subject Re: Application Master machine affinity/preference settings?
Date Thu, 15 Jun 2017 18:22:43 GMT
Hi Everett Anderson,
     I can think of doing it in 2 ways,
1. Create a labels for CoreMachine pool (as Exclusive or non exclusive
partition) and submit the AM request with CoreMachine label expression. In
this way AM's are submitted in the Coremachine pool itself. refer
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
2. After YARN-6050, if you are aware of the nodes which are CoreMachine,
then you can submit AM with multiple ResourceRequest with each
having ResourceName pointing to different nodes.

Regards,
+ Naga


On Thu, Jun 15, 2017 at 9:17 PM, Everett Anderson <everett@nuna.com.invalid>
wrote:

> Hi!
>
> We've been using Hadoop MapReduce and Spark on YARN on AWS Elastic
> MapReduce (EMR). EMR has a concept of Core versus Task nodes, where Core
> nodes participate in HDFS but Task nodes don't and their number can be more
> easily scaled up or down based on load.
>
> Most applications we run are batch and can tolerate machines going away
> well, but some of them are ad hoc interactive Spark sessions. Spark seems
> to handle executors (workers) going away okay, but if the main Application
> Master for that user's session goes away, they lose state.
>
> Is there a mechanism in YARN such that we could prioritize launching
> Application Masters on the Core machine pool in a cluster when resources
> are available?
>
> I know there are scheduling queues that we could use to segregate isolate
> entire applications -- such as batch versus interactive ones -- but I'm not
> sure if there's a way to ensure just the AM of a given application is
> prioritized to be on a specific set of machines.
>
> Thanks!
>
> - Everett
>
>

Mime
View raw message