flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Flink on YARN - tmp directory
Date Mon, 31 Jul 2017 15:55:45 GMT
Hi Chris,

I think in this case we need to change what is passed as "-Djava.io <http://djava.io/>.tmpdir"
to the JVMs that run the TaskManagers. You should be able to achieve this via env.java.opts
or more specifically env.java.opts.taskmanager [1]. The directory specified via task taskmanager.tmp.dirs
is only used to set the internal Flink tmp directories but doesn't change what Java assumes
as the tmp directory. You should be able to change that setting in the flink-conf.yaml or
pass it as a "dynamic property" when running via bin/flink (per-job YARN cluster) or when
creating the YARN session. For example:

bin/flink ... -Denv.java.opts.taskmanager="-Djava.io <http://djava.io/>.tmpdir=/my/tmp"
...

Best,
Aljoscha

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#common-options
> On 29. Jul 2017, at 00:00, Chris Hebert <chris.hebert-int@digitalreasoning.com>
wrote:
> 
> I should also note that the above steps did get the Flink JobManager and TaskManagers
to save their tmp web dashboard files to /my/tmp/ and to show in the Dashboard that the taskmanager.tmp.dirs
property had been properly set to /my/tmp/, but the tmp files I wrote in my jobs stubbornly
wrote to /tmp/ anyway.
> 
> On Fri, Jul 28, 2017 at 4:55 PM, Chris Hebert <chris.hebert-int@digitalreasoning.com
<mailto:chris.hebert-int@digitalreasoning.com>> wrote:
> Hi,
>  
> My jobs create tmp files like so:
> 
> java.nio.file.Path tmpFilePath = java.nio.file.Files.createTempFile("tmpFile", "txt");
> 
> They currently appear in /tmp/, but I want them somewhere else, say /my/tmp/.
> 
> The Flink on YARN docs say:
> Flink on YARN will overwrite the following configuration parameters jobmanager.rpc.address
(because the JobManager is always allocated at different machines), taskmanager.tmp.dirs (we
are using the tmp directories given by YARN) and parallelism.default if the number of slots
has been specified.
> How would I specify a different tmp directory for a job without modifying my YARN tmp
directories?
> 
> I tried the taskmanager.tmp.dirs property in conf/flink-conf.yaml anyway, that failed.
> 
> I appended -Djava.io.tmpdir=/my/tmp/ to JVM_ARGS and all three variations of DEFAULT_ENV_JAVA_OPTS
in bin/config.sh, that failed.
> 
> I passed -Djava.io.tmpdir=/my/tmp/ and variations as arguments to ./bin/yarn-session.sh
and ./bin/flink run et cetera, that failed.
> 
> Odd observation:
> The hadoop.tmp.dir property is set in my core-site.xml to /some/other/tmp/, yet Flink
writes to /tmp/. My yarn-site.xml specifies no tmp.
> 
> Side note:
> My Flink job is a Beam pipeline. I doubt that's relevant, but let me know if it is.
> 
> Thanks,
> Chris
> 


Mime
View raw message