flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Nentwig <nent...@informatik.uni-leipzig.de>
Subject Complex batch workflow needs (too) much time to create executionPlan
Date Mon, 22 Aug 2016 11:50:12 GMT
Hello Flink community,

I created a slightly long batch workflow for my use case of clustering
vertices using
Flink and Gelly. Executing each of the workflow parts individually (and
intermediate results to disk) works as suspected.

When combining workflow parts to longer jobs, I noticed that the time
'Job Name' time and the actual 'Start Time' in the Flink Dashboard differ.
longer workflow chains the time difference gets bigger and bigger.  At this
point, I think that Flink is creating the execution plan which is executed
directly afterwards. As an example (90% of the workflow combined), I 'wait'
the execution plan for 77-78 seconds, then the job is accepted for execution
needs another 7-9 seconds to process a small test dataset (<8k vertices with
property values and edges) - each run repeated 3 times. If running only
env.getExecutionPlan() I will wait similar time for the execution plan. I
added the 
JSON execution plan to this post. For bigger datasets the execution plan 
creation time and the job execution time grows as well in my scenario.

When I now add a vertex centric iteration to my workflow and start the Flink
job, I don't get a result at all: I stopped the job
(print execution plan to log) at the following point:

- waited > 20 hours after 'flink run ...'
- two cores on my machine are at 100% all the time working on the flink job
- no entry in Flink dashboard at all
- no entry in log file after these lines:

org.apache.flink.client.CliFrontend            - Starting execution of
org.apache.flink.client.program.Client         - Starting program in
interactive mode
org.apache.flink.api.java.ExecutionEnvironment - The job has 2 registered
types and 0 default Kryo serializers
org.apache.flink.optimizer.Optimizer           - The parallelism of nested
dataflows (such as step functions in iterations) is currently fixed to the
parallelism of the surrounding operator (the iteration).

Most likely the workflow could be optimized in many ways to need less time
certain points (yes, I am not a Flink expert in many places), but I think
long/complex workflows would still suffer of problems like this.
Due to the fact that every single step is producing output (and some
parts of the workflow do so, too), I currently suspect the Flink optimizer /
execution plan creation to be the problem and therefore ask anyone here if
have experience with similar behavior. Any suggestions how I could
run long/complex workflows not running in such problems? ;)

If there is not (an instant) 'solution' to the problem I would be still
interested in opinions and ideas, thanks in advance!                                     



View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Complex-batch-workflow-needs-too-much-time-to-create-executionPlan-tp8596.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

View raw message