flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Alexandrov <alexander.s.alexand...@gmail.com>
Subject Gather a distributed dataset
Date Mon, 12 Jan 2015 10:42:57 GMT
Hi there,

I wished for intermediate datasets, and Santa Ufuk made my wishes come true
(thank you, Santa)!

Now that FLINK-986 is in the mainline, I want to ask some practical
questions.

In Spark, there is a way to put a value from the local driver to the
distributed runtime via

val x = env.parallelize(...) // expose x to the distributed runtime
val y = dataflow(env, x) // y is produced by a dataflow which reads from x

and also to get a value from the distributed runtime back to the driver

val z = env.collect("y")

As far as I know, in Flink we have an equivalent for parallelize

val x = env.fromCollection(...)

but not for collect. Is this still the case?

If yes, how hard would it be to add this feature at the moment? Can you
give me some pointers?

Regards,

Alexander

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message