flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Alber <alber.maximil...@gmail.com>
Subject Re: Random Shuffling
Date Tue, 23 Jun 2015 09:51:48 GMT
Thank you!

Still I cannot guarantee the size of each partition, or can I?
Something like randomSplit in Spark.

Cheers,
Max

On Mon, Jun 15, 2015 at 5:46 PM, Matthias J. Sax <
mjsax@informatik.hu-berlin.de> wrote:

> Hi,
>
> using partitionCustom, the data distribution depends only on your
> probability distribution. If it is uniform, you should be fine (ie,
> choosing the channel like
>
> > private final Random random = new Random(System.currentTimeMillis());
> > int partition(K key, int numPartitions) {
> >   return random.nextInt(numPartitions);
> > }
>
> should do the trick.
>
> -Matthias
>
> On 06/15/2015 05:41 PM, Maximilian Alber wrote:
> > Thanks!
> >
> > Ok, so for a random shuffle I need partitionCustom. But in that case the
> > data might be out of balance then?
> >
> > For the splitting. Is there no way to have exact sizes?
> >
> > Cheers,
> > Max
> >
> > On Mon, Jun 15, 2015 at 2:26 PM, Till Rohrmann <trohrmann@apache.org
> > <mailto:trohrmann@apache.org>> wrote:
> >
> >     Hi Max,
> >
> >     you can always shuffle your elements using the |rebalance| method.
> >     What Flink here does is to distribute the elements of each partition
> >     among all available TaskManagers. This happens in a round-robin
> >     fashion and is thus not completely random.
> >
> >     A different mean is the |partitionCustom| method which allows you to
> >     specify for each element to which partition it shall be sent. You
> >     would have to specify a |Partitioner| to do this.
> >
> >     For the splitting there is at moment no syntactic sugar. What you
> >     can do, though, is to assign each item a split ID and then use a
> >     |filter| operation to filter the individual splits. Depending on you
> >     split ID distribution you will have differently sized splits.
> >
> >     Cheers,
> >     Till
> >
> >     On Mon, Jun 15, 2015 at 1:50 PM Maximilian Alber
> >     alber.maximilian@gmail.com
> >     <http://mailto:alber.maximilian@gmail.com> wrote:
> >
> >         Hi Flinksters,
> >
> >         I would like to shuffle my elements in the data set and then
> >         split it in two according to some ratio. Each element in the
> >         data set has an unique id. Is there a nice way to do it with the
> >         flink api?
> >         (It would be nice to have guaranteed random shuffling.)
> >         Thanks!
> >
> >         Cheers,
> >         Max
> >
> >     ‚Äč
> >
> >
>
>

Mime
View raw message