flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Can serialization be disabled between chains?
Date Mon, 16 Jan 2017 13:27:32 GMT
One of the reasons is to ensure that data cannot be modified after it left
a thread.
A function that emits the same object several times (in order to reduce
object creation & GC) might accidentally modify emitted records if they
would be put as object in a queue.
Moreover, it is easier to control the memory consumption if data is
serialized into a fixed number of buffers instead of being put on the JVM

Best, Fabian

2017-01-16 14:21 GMT+01:00 Dmitry Golubets <dgolubets@gmail.com>:

> Hi Ufuk,
> Do you know what's the reason for serialization of data between different
> threads?
> Also, thanks for the link!
> Best regards,
> Dmitry
> On Mon, Jan 16, 2017 at 1:07 PM, Ufuk Celebi <uce@apache.org> wrote:
>> Hey Dmitry,
>> this is not possible if I'm understanding you correctly.
>> A task chain is executed by a single task thread and hence it is not
>> possible to continue processing before the record "leaves" the thread,
>> which only happens when the next task thread or the network stack
>> consumes it.
>> Hand over between chained tasks happens without serialization. Only
>> data between different task threads is serialized.
>> Depending on your use case the newly introduced async I/O feature
>> might be worth a look (will be part of the upcoming 1.2 release):
>> https://github.com/apache/flink/pull/2629

View raw message