flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Foster, Craig" <foscr...@amazon.com>
Subject Flink long-running YARN configuration
Date Thu, 25 Aug 2016 15:02:19 GMT
I'm trying to understand Flink YARN configuration. The flink-conf.yaml file is supposedly the
way to configure Flink, except when you launch Flink using YARN since that's determined for
the AM. The following is contradictory or not completely clear:

"The system will use the configuration in conf/flink-config.yaml. Please follow our configuration
guide<https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html> if you
want to change something.

Flink on YARN will overwrite the following configuration parameters jobmanager.rpc.address
(because the JobManager is always allocated at different machines), taskmanager.tmp.dirs (we
are using the tmp directories given by YARN) and parallelism.default if the number of slots
has been specified."

OK, so it will use conf/flink-config.yaml, except for jobmanager.rpc.address/port which will
be decided by YARN and not necessarily reported to the user since those are dynamically allocated
by YARN. That's fine with me, but if I want to make a "long-running" Flink cluster available
for more than one user, where do I check in Flink for the Application Master hostname--or
do I just have to scrape output of logs (which would definitely be undesirable)? First, I
thought this would be written by Flink to conf/flink-config.yaml. It is not. Then I thought
it must surely be written to the HDFS configuration directory (under something like hdfs://$USER/.flink/)
for that application but that is merely copied from the original conf/flink-config.yaml and
doesn't have an accurate configuration for the specified application. So is there an accurate
config somewhere in HDFS or on the ResourceManager--i.e. where could I programmatically find
that (outside of manipulating YARN app names or scraping)?


View raw message