flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jay vyas <jayunit100.apa...@gmail.com>
Subject Re: bigpetstore flink : parallelizing collections
Date Sun, 12 Jul 2015 13:57:14 GMT
awesome thanks ! i ll  try it out.

This is part of  a wave of jiras for bigtop flink integration.  If your
distro/packaging folks collaborate with us - it will save you time in the
long run, because you can piggy back the bigtop infra for rpm/deb
packaging, smoke testing, and HDFS interop testing ....

https://issues.apache.org/jira/browse/BIGTOP-1927

Just FYI, great to connect stephan and others, will keep you posted !

On Sun, Jul 12, 2015 at 9:16 AM, Stephan Ewen <sewen@apache.org> wrote:

> Hi Jay!
>
> You can use the "fromCollection()" or "fromElements()" method to create a
> DataSet or DataStream from a Java/Scala collection. That moves the data
> into the cluster and allows you to run parallel transformations on the
> elements.
>
> Make sure you set the parallelism of the operation that you want to be
> parallel.
>
>
> Here is a code sample:
>
> ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
>
> DataSet<MyType> data = env.fromElements(myArray);
>
> data.map(new TrasactionMapper()).setParallelism(80); // makes sure you
> have 80 mappers
>
>
> Stephan
>
>
> On Sun, Jul 12, 2015 at 3:04 PM, jay vyas <jayunit100.apache@gmail.com>
> wrote:
>
>> Hi flink.
>>
>> Im happy to announce that ive done a small bit of initial hacking on
>> bigpetstore-flink, in order to represent what we do in spark in flink.
>>
>> TL;DR the main question is at the bottom!
>>
>> Currently, i want to generate transactions for a list of customers.  The
>> generation of transactions is a parallel process, and the customers are
>> generated beforehand.
>>
>> In hadoop , we can create an input format with custom splits if we want
>> to split a data set up, otherwise, we can break it into files.
>>
>> in spark, there is a conveneint "parallelize" which we can run on a list,
>> which we can then capture the RDD from , and run a parallelized transform.
>>
>> In flink, i have an array of "customers" and i want to parallelize our
>> transaction generator for each customer.  How would i do that?
>>
>> --
>> jay vyas
>>
>
>


-- 
jay vyas

Mime
View raw message