flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Union/append performance question
Date Mon, 07 Sep 2015 17:06:41 GMT
3 or 4 usually..
On 7 Sep 2015 18:39, "Fabian Hueske" <fhueske@gmail.com> wrote:

> And how many unions would your program use if you would follow the
> union-in-loop approach?
>
> 2015-09-07 18:31 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>
>> In the order of 10 GB..
>>
>> On Mon, Sep 7, 2015 at 6:14 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>>
>>> Accumulators can be used to collect records, but they are not designed
>>> to hold large amounts of data.
>>> It might work up to a certain point (~10MB) and fail beyond that.
>>>
>>> How many unions do you plan to use in your program?
>>>
>>>
>>>
>>> 2015-09-07 17:58 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>
>>>> ok thanks. are there any alternatives to that?may I use accumulators
>>>> for that?
>>>> On 7 Sep 2015 17:47, "Fabian Hueske" <fhueske@gmail.com> wrote:
>>>>
>>>>> If the loop count of 3 is fixed (or not significantly larger), union
>>>>> should be fine.
>>>>>
>>>>> 2015-09-07 17:07 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>>>
>>>>>> Sorry the program has a union at   accumulated = accumulated.union(x.filter(t.f1
>>>>>> == 0))
>>>>>>
>>>>>> On Mon, Sep 7, 2015 at 4:58 PM, Fabian Hueske <fhueske@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Flavio,
>>>>>>>
>>>>>>> your example does not contain a union.
>>>>>>>
>>>>>>> Union itself basically comes for free. However, if you have a
lot of
>>>>>>> small DataSet that you want to union, the plan can become very
complex and
>>>>>>> might cause overhead due to scheduling many small tasks. For
example, it is
>>>>>>> usually better to have one data source and input format that
reads multiple
>>>>>>> small files instead of adding one data source for each tiny file
and apply
>>>>>>> union to all data sources to get all data.
>>>>>>>
>>>>>>> TL;DR; if your iteration count is only 3 as your example suggests
>>>>>>> you should be fine. If it exceeds say 32 it might be worth thinking
about
>>>>>>> your program.
>>>>>>>
>>>>>>> Cheers, Fabian
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2015-09-07 16:29 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>
>>>>>>> :
>>>>>>>
>>>>>>>> Hi Stephan,
>>>>>>>> thanks for the answer. Unfortunately I dind't understand
if there's
>>>>>>>> an alternative to union right now..
>>>>>>>> My process is basically like this:
>>>>>>>>
>>>>>>>> Dataset x = ...
>>>>>>>> while(loopCnt < 3){
>>>>>>>>    x = x.join(y).where(0).equalTo(0).with());
>>>>>>>>    accumulated = x.filter(t.f1 == 0);
>>>>>>>>    x =  x.filter(t.f1!=0);
>>>>>>>>    loopCnt++;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Flavio
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 7, 2015 at 3:15 PM, Stephan Ewen <sewen@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Union, like all operators, is lazy. When you call union,
it only
>>>>>>>>> builds a "union stream", that unions when you execute
the task. So nothing
>>>>>>>>> is added before you call "env.execute()"
>>>>>>>>>
>>>>>>>>> After you call "env.execute()" and then union again,
you will
>>>>>>>>> re-execute the entire history of computation to compute
the data set that
>>>>>>>>> you union with. Hence, for incremental computations,
union() is probably
>>>>>>>>> not a good choice, unless you persist intermediate data
(seamless support
>>>>>>>>> for that is WIP).
>>>>>>>>>
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Sep 7, 2015 at 2:56 PM, Flavio Pompermaier <
>>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>>
>>>>>>>>>> Hi to all,
>>>>>>>>>> I have a job where I have to incrementally add Tuples
to a
>>>>>>>>>> dataset (in a while loop).
>>>>>>>>>> Is union() the best operator for this task or is
there a more
>>>>>>>>>> performant operator for this task?
>>>>>>>>>> Does union affect the read of already existing elements
or it
>>>>>>>>>> just appends the new ones somewhere?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Flavio
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>

Mime
View raw message