hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Kowalczyk <matt.kowalc...@gmail.com>
Subject yarn memory settings in heterogeneous cluster
Date Fri, 28 Aug 2015 19:25:51 GMT

I have deployed a hadoop 2.7.1 cluster with heterogeneous nodes. For the
sake of discussion, suppose one node has 100GB of RAM while another has 50

I'm using the Capacity Scheduler and deploy mapred-site.xml and
yarn-site.xml configuration files with various memory settings that are
tailored to the resources for a particular machine. The master node, and
the two slave node classes each get a different configuration file since
they have different memory profiles.

I am trying to configure yarn is such a way as to take advantage of all the
resources available on the nodes and I'm having particular difficulty with
the minimum allocation setting. What I can tell from my deployment is that
there are certain memory settings that are node specific while others that
are cluster wide. A particular configuration setting that's causing me
troubles is,


This appears to behave as a cluster-wide setting; however, due to my two
node classes, a per-node yarn.scheduler.minimum-allocation-mb would be

I also notice the behavior that yarn _always_ allocates
yarn.scheduler.minimum-allocation-mb to each container irrespective of how
their per-node memory settings are configured.

Couple questions to help drive the discussion.

- how should yarn be configured in a heterogeneous cluster?
- yarn exposes a minimum and maximum allocation, how do I indicate that
additional memory is desirable such that yarn doesn't always allocate the
minimum? More concretely, suppose I have two jobs with differing memory
requirements--how would I communicate this to yarn and request that my
containers be allocated with additional memory?


View raw message