flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kovas boguta <kovas.bog...@gmail.com>
Subject Questions re: ExecutionGraph & ResultPartitions for interactive use a la Spark
Date Mon, 04 Jan 2016 21:47:50 GMT
I'm impressed with the Flink API, it seems simpler and more composable than
what I've seen elsewhere.

I'm trying to see how to achieve a more interactive, REPL-driven
experience, similar to Spark. I'm consuming Flink from Clojure.

For now I'm only interested in smaller clusters & interactive usage, so the
failure recovery aspect of RDDs is less important relative to simply
caching intermediate results (in this context, keeping the ResultPartitions
partitions around even when theres no consuming tasks) , and dynamically
extending the jobgraph.

Three questions:

1) How can I prevent ResultPartitions from being released?

In interactive use, RPs should not necessarily be released when there are
no pending tasks to consume them.

Looking at the code, its hard to understand the high-level logic of who
triggers their release, how the refcounting works, etc. For instance,
is releasePartitionsProducedBy called by the producer, or the consumer, or
both? Is the refcount initialized at the beginning of the task setup, and
decremented every time its read out?

Ideally, I could force certain ResultPartitions to only be manually
releasable, so I can consume them over and over.

2) Can I call attachJobGraph on the ExecutionGraph after the job has begun
executing to add more nodes?

I read that Flink does not support changing the running the topology. But
what about extending the topology to add more nodes?

If the IntermediateResultPartition are just sitting around from previously
completely tasks, this seems straightforward in principle.. would one have
to fiddle with the scheduler/event system to kick things off again?

3) Can I have a ConnectionManager without a TaskManager?

For interactive use, I want to selectively pull data from ResultPartitions
into my local REPL process. But I don't want my local process to be a
TaskManager node, and have tasks assigned to it.

So this question boils down to, how to hook a ConnectionManager into the
Flink communication/actor system?


View raw message