flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Carey <sca...@expedia.com>
Subject Re: 1.1.4 on YARN - vcores change?
Date Fri, 13 Jan 2017 18:09:24 GMT
Ufuk & Robert,

There's a good chance you're right! On the EMR master node, where yarn-session.sh is run,
/etc/hadoop/conf/yarn-site.xml says that "yarn.nodemanager.resource.cpu-vcores" is 4.


Meanwhile, on the core nodes, the value in that file is 8.





Shall I submit a JIRA? This might be pretty easy to fix given that "yarn-session.sh -q" already
knows how to get the vcore count on the nodes. I can try to make a PR for it too. I'm still
not sure why the containers are showing up as only using one vcore though... or if that is
expected.

Meanwhile, it seems like overriding yarn.containers.vcores would be a successful workaround.
Let me know if you disagree.

The other slightly annoying thing that I have to deal with is leaving enough memory for the
JobManager. Since all task managers are the same size, I either need to reduce the size of
every task manager (wasting resources), or I have to double the task managers (and halve the
memory) & subtract one (basically doubling the number of separate JVMs & halving the
slot density within the JVMs) in order to leave room for the JobManager. What do you guys
think of the following change in approach?

User specifies:
number of taskmanagers
memory per slot (not per taskmanager)
total number of slots (not slots per taskmanager)

Then, Flink would decide how to organize the task managers & slots in order to also leave
room for the JobManager. This should be straightforward compared to bin packing because all
slots are the same size. Maybe I'm oversimplifying... might be a little tougher if the nodes
are different sizes and we don't know on what node the ApplicationMaster/JobManager will run.

-Shannon

On 1/13/17, 2:59 AM, "Ufuk Celebi" <uce@apache.org> wrote:

>On Fri, Jan 13, 2017 at 9:57 AM, Robert Metzger <rmetzger@apache.org> wrote:
>> Flink is reading the number of available vcores from the local YARN
>> configuration. Is it possible that the YARN / Hadoop config on the machine
>> where you are submitting your job from sets the number of vcores as 4 ?
>
>Shouldn't we retrieve this number from the cluster instead?
>
Mime
View raw message