flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: bigpetstore flink : parallelizing collections
Date Mon, 13 Jul 2015 13:07:12 GMT
Hi Jay,

Great to hear there is effort to integrate Flink with BigTop. Please let us
know if any questions come up in the course of the integration!

Best,
Max


On Sun, Jul 12, 2015 at 3:57 PM, jay vyas <jayunit100.apache@gmail.com>
wrote:

> awesome thanks ! i ll  try it out.
>
> This is part of  a wave of jiras for bigtop flink integration.  If your
> distro/packaging folks collaborate with us - it will save you time in the
> long run, because you can piggy back the bigtop infra for rpm/deb
> packaging, smoke testing, and HDFS interop testing ....
>
> https://issues.apache.org/jira/browse/BIGTOP-1927
>
> Just FYI, great to connect stephan and others, will keep you posted !
>
> On Sun, Jul 12, 2015 at 9:16 AM, Stephan Ewen <sewen@apache.org> wrote:
>
>> Hi Jay!
>>
>> You can use the "fromCollection()" or "fromElements()" method to create a
>> DataSet or DataStream from a Java/Scala collection. That moves the data
>> into the cluster and allows you to run parallel transformations on the
>> elements.
>>
>> Make sure you set the parallelism of the operation that you want to be
>> parallel.
>>
>>
>> Here is a code sample:
>>
>> ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
>>
>> DataSet<MyType> data = env.fromElements(myArray);
>>
>> data.map(new TrasactionMapper()).setParallelism(80); // makes sure you
>> have 80 mappers
>>
>>
>> Stephan
>>
>>
>> On Sun, Jul 12, 2015 at 3:04 PM, jay vyas <jayunit100.apache@gmail.com>
>> wrote:
>>
>>> Hi flink.
>>>
>>> Im happy to announce that ive done a small bit of initial hacking on
>>> bigpetstore-flink, in order to represent what we do in spark in flink.
>>>
>>> TL;DR the main question is at the bottom!
>>>
>>> Currently, i want to generate transactions for a list of customers.  The
>>> generation of transactions is a parallel process, and the customers are
>>> generated beforehand.
>>>
>>> In hadoop , we can create an input format with custom splits if we want
>>> to split a data set up, otherwise, we can break it into files.
>>>
>>> in spark, there is a conveneint "parallelize" which we can run on a
>>> list, which we can then capture the RDD from , and run a parallelized
>>> transform.
>>>
>>> In flink, i have an array of "customers" and i want to parallelize our
>>> transaction generator for each customer.  How would i do that?
>>>
>>> --
>>> jay vyas
>>>
>>
>>
>
>
> --
> jay vyas
>

Mime
View raw message