flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: How to send local files to a flink job on YARN
Date Fri, 14 Jul 2017 09:45:42 GMT
There’s a bit of a misconception here: in Flink there is no “driver” as there is in spark
and the entry point of your program (“main()”) is not executed on the cluster but in the
“client”. The main method is only responsible for constructing a program graph, this is
then shipped to the cluster and the client (or the “main()”) method can shut down at this
point. In your concrete case, this means that the main() method is not executed in the YARN
context, i.e. it does not have the files that you specified with the “—yarnship” command.

Regarding “—yarnship” in general, I have descended into the depths of the Flink YARN
support and this is how it works:
FlinkYarnSessionCli is the piece of code that acts as entry point when specifying “-m yarn-cluster”
at the command line. This is the place where the options are defined: https://github.com/apache/flink/blob/f839018131024860a1b25b13cea7e1313add28d5/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L138-L138.
The options are not hardcoded but have a dynamic prefix, normally the short prefix is “y”
and the long prefix is “yarn”. In there you see

shipPath = new Option(shortPrefix + "t", longPrefix + "ship", true, "Ship files in the specified
directory (t for transfer)”);

This translates to having the -yt and —yarnship parameters.

As to how FlinkYarnSessionCli is used when specifying “-m yarn-cluster”, this happens
here: https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L136-L136
Essentially, a “CustomCommandLine” subclass is responsible for handling the user invocation
and the subclasses can announce that they would like to handle the user command line based
on certain settings. For example, FlinkYarnSessionCli will announce that it can handle a command
line when the “-m yarn-cluster” option is present: https://github.com/apache/flink/blob/f839018131024860a1b25b13cea7e1313add28d5/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L493-L493
The CliFrontend will loop though the list of registered CustomCommandLine instances and pick
the first one that announces that it would like to handle a given invocation: https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L1174-L1174

This is very convoluted and I hope my explications somehow help.


> On 13. Jul 2017, at 18:02, Ted Yu <yuzhihong@gmail.com> wrote:
> I went back to commit 6e38eb8:
> [FLINK-1436] [docs] update command line documentation
> A search in the repo for "yarnship" ended up with no hit in the code (same with commit
bf6b9aaab89e2e04678784525a42a19f099aa7f5 which is at top of git repo).
> Wondering whether it is supported.
> On Thu, Jul 13, 2017 at 8:10 AM, Guy Harmach <GuyH@amdocs.com> wrote:
> Hi,
> I’m running a flink job on YARN. I’d like to pass yaml configuration files to the
> I tried to use the flink cli –yarnship flag to point to a directory containing the
file, but wasn’t able to get it in the job.
> Can someone give an example of how to send local files and how to read them in the job?
> Thanks, Guy
> This message and the information contained herein is proprietary and confidential and
subject to the Amdocs policy statement,
> you may review at https://www.amdocs.com/about/email-disclaimer

View raw message