flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Udf Performance and Object Creation
Date Fri, 14 Aug 2015 15:56:31 GMT
I think Timo answered both questions (quoting Michael: "Hey Timo, yes that
is what I needed to know. Thanks").

Maybe one more comment. The motivation of the examples is not the best
performance but to showcase Flink's APIs and concepts.

Best, Fabian

2015-08-14 17:43 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:

> Any insight about these 2 questions..?
> On 12 Aug 2015 17:38, "Flavio Pompermaier" <pompermaier@okkam.it> wrote:
>
>> This is something I've never understood in depth: isn't a mapper created
>> for each record?if it's created only once per task manager then it's not so
>> different from mapPartition..what I'm missing here?
>>
>> And then a more philosophic question: all big data framework requires
>> somehow to manage memory very efficiently (Flink has even though to reserve
>> a fraction of the entire memory in order to have control over it). Wouldn't
>> be simpler if java would finally release some APIs (even marked as unsafe,
>> it doesn't change theMat much) to allow for a full control of the
>> memory..?it will make a lot of sense for all big data platforms (at least
>> for non-UDF code...).
>>
>> Best,
>> Flavio
>> On 12 Aug 2015 12:44, "Timo Walther" <twalthr@apache.org> wrote:
>>
>>> Hello Michael,
>>>
>>> every time you code a Java program you should avoid object creation if
>>> you want an efficient program, because every created object needs to be
>>> garbage collected later (which slows down your program performance).
>>> You can have small Pojos, just try to avoid the call "new" in your
>>> functions:
>>>
>>> Instead of:
>>>
>>> class Mapper implements MapFunction<String,Pojo> {
>>> public Pojo map(String s) {
>>>     Pojo p = new Pojo();
>>>     p.f = s;
>>> }
>>> }
>>>
>>> do:
>>>
>>> class Mapper implements MapFunction<String,Pojo> {
>>> private Pojo p = new Pojo();
>>> public Pojo map(String s) {
>>>     p.f = s;
>>> }
>>> }
>>>
>>> Then an object is only created once per Mapper and not per record.
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>>
>>> On 12.08.2015 11:53, Michael Huelfenhaus wrote:
>>>
>>>> Hello
>>>>
>>>> I have a question about the programming of user defined functions, is
>>>> it still like in old Stratosphere times the case that object creation
>>>> should be avoided al all cost? Because in some of the examples there are
>>>> now Tuples and other objects created before returning them.
>>>>
>>>> I gonna have an at least 6 step streaming plan and I am going to use
>>>> Pojos. Is it performance wise a big improvement to define one big pojo that
>>>> can be used by all the steps or better to have smaller ones to send less
>>>> data but create more objects.
>>>>
>>>> Thanks
>>>> Michael
>>>>
>>>
>>>

Mime
View raw message