flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
Subject Re: Random Shuffling
Date Mon, 15 Jun 2015 12:19:14 GMT
I think, you need to implement an own Partitioner.java and hand it via
DataSet.partitionCustom(partitioner, field)

(Just specify any field you like; as you don't want to group by key, it
doesn't matter.)

When implementing the partitionier, you can ignore the key parameter and
compute the output channel randomly.

This is kind of a work-around, but it should work.


On 06/15/2015 01:49 PM, Maximilian Alber wrote:
> Hi Flinksters,
> I would like to shuffle my elements in the data set and then split it in
> two according to some ratio. Each element in the data set has an unique
> id. Is there a nice way to do it with the flink api?
> (It would be nice to have guaranteed random shuffling.)
> Thanks!
> Cheers,
> Max

View raw message