flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: bigpetstore flink : parallelizing collections
Date Sun, 12 Jul 2015 13:16:39 GMT
Hi Jay!

You can use the "fromCollection()" or "fromElements()" method to create a
DataSet or DataStream from a Java/Scala collection. That moves the data
into the cluster and allows you to run parallel transformations on the
elements.

Make sure you set the parallelism of the operation that you want to be
parallel.


Here is a code sample:

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

DataSet<MyType> data = env.fromElements(myArray);

data.map(new TrasactionMapper()).setParallelism(80); // makes sure you have
80 mappers


Stephan


On Sun, Jul 12, 2015 at 3:04 PM, jay vyas <jayunit100.apache@gmail.com>
wrote:

> Hi flink.
>
> Im happy to announce that ive done a small bit of initial hacking on
> bigpetstore-flink, in order to represent what we do in spark in flink.
>
> TL;DR the main question is at the bottom!
>
> Currently, i want to generate transactions for a list of customers.  The
> generation of transactions is a parallel process, and the customers are
> generated beforehand.
>
> In hadoop , we can create an input format with custom splits if we want to
> split a data set up, otherwise, we can break it into files.
>
> in spark, there is a conveneint "parallelize" which we can run on a list,
> which we can then capture the RDD from , and run a parallelized transform.
>
> In flink, i have an array of "customers" and i want to parallelize our
> transaction generator for each customer.  How would i do that?
>
> --
> jay vyas
>

Mime
View raw message