flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Udf Performance and Object Creation
Date Fri, 14 Aug 2015 15:57:45 GMT
Hi!

(1) A mapper is created once per parallel task. So if you create a program
that runs a map() transformation with a parallelism of n, you will have n
mapper instances in the cluster. Some may be on the same TaskManager, if
the TaskManager has multiple slots.

(2) I would really like that. But it means Java has to deal with both
managed and unmanaged memory at the same time, which is quite a heavy
addition. C# has some form of support for that.

BTW: Where did you originally post these questions? I have not seen them
before...

On Fri, Aug 14, 2015 at 5:43 PM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Any insight about these 2 questions..?
> On 12 Aug 2015 17:38, "Flavio Pompermaier" <pompermaier@okkam.it> wrote:
>
>> This is something I've never understood in depth: isn't a mapper created
>> for each record?if it's created only once per task manager then it's not so
>> different from mapPartition..what I'm missing here?
>>
>> And then a more philosophic question: all big data framework requires
>> somehow to manage memory very efficiently (Flink has even though to reserve
>> a fraction of the entire memory in order to have control over it). Wouldn't
>> be simpler if java would finally release some APIs (even marked as unsafe,
>> it doesn't change theMat much) to allow for a full control of the
>> memory..?it will make a lot of sense for all big data platforms (at least
>> for non-UDF code...).
>>
>> Best,
>> Flavio
>> On 12 Aug 2015 12:44, "Timo Walther" <twalthr@apache.org> wrote:
>>
>>> Hello Michael,
>>>
>>> every time you code a Java program you should avoid object creation if
>>> you want an efficient program, because every created object needs to be
>>> garbage collected later (which slows down your program performance).
>>> You can have small Pojos, just try to avoid the call "new" in your
>>> functions:
>>>
>>> Instead of:
>>>
>>> class Mapper implements MapFunction<String,Pojo> {
>>> public Pojo map(String s) {
>>>     Pojo p = new Pojo();
>>>     p.f = s;
>>> }
>>> }
>>>
>>> do:
>>>
>>> class Mapper implements MapFunction<String,Pojo> {
>>> private Pojo p = new Pojo();
>>> public Pojo map(String s) {
>>>     p.f = s;
>>> }
>>> }
>>>
>>> Then an object is only created once per Mapper and not per record.
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>>
>>> On 12.08.2015 11:53, Michael Huelfenhaus wrote:
>>>
>>>> Hello
>>>>
>>>> I have a question about the programming of user defined functions, is
>>>> it still like in old Stratosphere times the case that object creation
>>>> should be avoided al all cost? Because in some of the examples there are
>>>> now Tuples and other objects created before returning them.
>>>>
>>>> I gonna have an at least 6 step streaming plan and I am going to use
>>>> Pojos. Is it performance wise a big improvement to define one big pojo that
>>>> can be used by all the steps or better to have smaller ones to send less
>>>> data but create more objects.
>>>>
>>>> Thanks
>>>> Michael
>>>>
>>>
>>>

Mime
View raw message