flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nico Kruber (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-9913) Improve output serialization only once in RecordWriter
Date Mon, 27 Aug 2018 14:10:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nico Kruber updated FLINK-9913:
-------------------------------
    Description: 
Currently the {{RecordWriter}} emits output into multi channels via {{ChannelSelector}} 
or broadcasts output to all channels directly. Each channel has a separate {{RecordSerializer}}
for serializing outputs, that means the output will be serialized as many times as the number
of selected channels.

As we know, data serialization is a high cost operation, so we can get good benefits by improving the
serialization only once.

I would suggest the following changes for realizing it.
 # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the channels.
 # The output is serialized into the intermediate data buffer only once for different channels.
 # The intermediate serialization results are copied into different {{BufferBuilder}}s for
different channels.

An additional benefit by using a single serializer for all channels is that we get a potentially
significant reduction on heap space overhead from fewer intermediate serialization buffers
(only once we got over 5MiB, these buffers were pruned back to 128B!).

  was:
Currently the {{RecordWriter}} emits output into multi channels via {{ChannelSelector}} 
or broadcasts output to all channels directly. Each channel has a separate {{RecordSerializer}}
for serializing outputs, that means the output will be serialized as many times as the number
of selected channels.

As we know, data serialization is a high cost operation, so we can get good benefits by improving the
serialization only once.

I would suggest the following changes for realizing it.
 # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the channels.
 # The output is serialized into the intermediate data buffer only once for different channels.
 # The intermediate serialization results are copied into different {{BufferBuilder}}s for
different channels.


> Improve output serialization only once in RecordWriter
> ------------------------------------------------------
>
>                 Key: FLINK-9913
>                 URL: https://issues.apache.org/jira/browse/FLINK-9913
>             Project: Flink
>          Issue Type: Improvement
>          Components: Network
>    Affects Versions: 1.6.0
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> Currently the {{RecordWriter}} emits output into multi channels via {{ChannelSelector}} 
or broadcasts output to all channels directly. Each channel has a separate {{RecordSerializer}}
for serializing outputs, that means the output will be serialized as many times as the number
of selected channels.
> As we know, data serialization is a high cost operation, so we can get good benefits
by improving the serialization only once.
> I would suggest the following changes for realizing it.
>  # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the channels.
>  # The output is serialized into the intermediate data buffer only once for different
channels.
>  # The intermediate serialization results are copied into different {{BufferBuilder}}s
for different channels.
> An additional benefit by using a single serializer for all channels is that we get a
potentially significant reduction on heap space overhead from fewer intermediate serialization
buffers (only once we got over 5MiB, these buffers were pruned back to 128B!).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message