kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhaskar Gollapudi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-5761) Serializer API should support ByteBuffer
Date Tue, 22 Aug 2017 09:21:00 GMT
Bhaskar Gollapudi created KAFKA-5761:
----------------------------------------

             Summary: Serializer API should support ByteBuffer
                 Key: KAFKA-5761
                 URL: https://issues.apache.org/jira/browse/KAFKA-5761
             Project: Kafka
          Issue Type: Improvement
          Components: clients
    Affects Versions: 0.11.0.0
            Reporter: Bhaskar Gollapudi


Consider the Serializer : Its main method is :

byte[] serialize(String topic, T data);

Producer applications create a implementation that takes in an instance (
of T ) and convert that to a byte[]. This byte array is allocated a new for
this message.This byte array then is handed over to Kafka Producer API
internals that write the bytes to buffer/ network socket. When the next
message arrives , the serializer instead of creating a new byte[] , should
try to reuse the existing byte[] for the new message. This requires two
things :

1. The process of handing off the bytes to the buffer/socket and reusing
the byte[] must happen on the same thread.

2 There should be a way for marking the end of available bytes in the
byte[].

The first is reasonably simple to understand. If this does not happen , and
without other necessary synchrinization , the byte[] get corrupted and so
is the message written to buffer/socket.However , this requirement is easy
to meet for a producer application , because it controls the threads on
which the serializer is invoked.

The second is where the problem lies with the current API. It does not
allow a variable size of bytes to be read from a container. It is limited
by the byte[]'s length. This forces the producer to

1 either create a new byte[] for a message that is bigger than the previous
one.
OR
2. Decide a max size and use a padding .

Both are cumbersome and error prone, and may cause wasting of network
bandwidth.

Instead , if there is an Serializer with this method :

ByteBuffer serialize(String topic, T data);

This helps to implements a reusable bytes container for  clients to avoid
allocations for each message.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message