hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Everett Anderson <ever...@nuna.com.INVALID>
Subject Application Master machine affinity/preference settings?
Date Thu, 15 Jun 2017 15:47:56 GMT

We've been using Hadoop MapReduce and Spark on YARN on AWS Elastic
MapReduce (EMR). EMR has a concept of Core versus Task nodes, where Core
nodes participate in HDFS but Task nodes don't and their number can be more
easily scaled up or down based on load.

Most applications we run are batch and can tolerate machines going away
well, but some of them are ad hoc interactive Spark sessions. Spark seems
to handle executors (workers) going away okay, but if the main Application
Master for that user's session goes away, they lose state.

Is there a mechanism in YARN such that we could prioritize launching
Application Masters on the Core machine pool in a cluster when resources
are available?

I know there are scheduling queues that we could use to segregate isolate
entire applications -- such as batch versus interactive ones -- but I'm not
sure if there's a way to ensure just the AM of a given application is
prioritized to be on a specific set of machines.


- Everett

View raw message