flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Neumann <mneum...@spotify.com>
Subject how to split data-sets efficiently?
Date Sun, 27 Jul 2014 10:56:47 GMT

I have a dataset of StringID's and I want to map them to Longs by using a
hash function. I will use the LongID's in a series of Iterative
computations and then map back to StringID's.
Currently I have a map operation that creates tuples with the string and
the long. I have an other mapper cleaning out the String's.

Is there a way to do a operation that allows for more the one output set
(basically split a set into 2 sets)? This would reduce the complexity of
the code a lot.
Also how does the optimizer deal with this case? Does it join both map
operation's together and actually run it as if it would be a split?

cheers Martin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message