flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Giacomo Licari <giacomo.lic...@gmail.com>
Subject Re: Parallelism question
Date Tue, 14 Apr 2015 10:12:56 GMT
Hi Max,
thank you for your reply.

DataSink contains data ordered, I mean, it contains in order output1,
output1 ... output5? Or are them mixed?

Thanks a lot,
Giacomo

On Tue, Apr 14, 2015 at 11:58 AM, Maximilian Michels <mxm@apache.org> wrote:

> Hi Giacomo,
>
> If I understand you correctly, you want your Flink job to execute with a
> parallelism of 5. Just call setDegreeOfParallelism(5) on your
> ExecutionEnvironment. That way, all operations, when possible, will be
> performed using 5 parallel instances. This is also true for the DataSink
> which will produce 5 files containing the output data from the parallel
> instances.
>
> Best,
> Max
>
>
> On Tue, Apr 14, 2015 at 10:38 AM, Giacomo Licari <giacomo.licari@gmail.com
> > wrote:
>
>> Hi guys,
>> I have a question about how parallelism works.
>>
>> If I have a large dataset and I would divide it into 5 blocks, can I pass
>> each block of data to a fixed parallel process (for example I set up 5
>> process) ?
>>
>> And if the results data from each process arrive to the output not in an
>> ordered way, can I order them? For example:
>>
>> data from process 1
>> data from process 2
>> and so on
>>
>> Thank you guys!
>>
>
>

Mime
View raw message