flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Alexandrov <alexander.s.alexand...@gmail.com>
Subject Gather a distributed dataset
Date Mon, 12 Jan 2015 10:42:57 GMT
Hi there,

I wished for intermediate datasets, and Santa Ufuk made my wishes come true
(thank you, Santa)!

Now that FLINK-986 is in the mainline, I want to ask some practical

In Spark, there is a way to put a value from the local driver to the
distributed runtime via

val x = env.parallelize(...) // expose x to the distributed runtime
val y = dataflow(env, x) // y is produced by a dataflow which reads from x

and also to get a value from the distributed runtime back to the driver

val z = env.collect("y")

As far as I know, in Flink we have an equivalent for parallelize

val x = env.fromCollection(...)

but not for collect. Is this still the case?

If yes, how hard would it be to add this feature at the moment? Can you
give me some pointers?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message