kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhaskar Gollapudi <bhaskargollap...@gmail.com>
Subject Improvement for Kafka Serializer API
Date Mon, 21 Aug 2017 21:06:03 GMT
Hi ,

I wanted to put forward an idea that occurred to me while using Kafka
Producer api.

Consider the Serializer : Its main method is :

byte[] serialize(String topic, T data);

Producer applications create a implementation that takes in an instance (
of T ) and convert that to a byte[]. This byte array is allocated a new for
this message.This byte array then is handed over to Kafka Producer API
internals that write the bytes to buffer/ network socket. When the next
message arrives , the serializer instead of creating a new byte[] , should
try to reuse the existing byte[] for the new message. This requires two
things :

1. The process of handing off the bytes to the buffer/socket and reusing
the byte[] must happen on the same thread.

2 There should be a way for marking the end of available bytes in the
byte[].

The first is reasonably simple to understand. If this does not happen , and
without other necessary synchrinization , the byte[] get corrupted and so
is the message written to buffer/socket.However , this requirement is easy
to meet for a producer application , because it controls the threads on
which the serializer is invoked.

The second is where the problem lies with the current API. It does not
allow a variable size of bytes to be read from a container. It is limited
by the byte[]'s length. This forces the producer to

1 either create a new byte[] for a message that is bigger than the previous
one.
OR
2. Decide a max size and use a padding .

Both are cumbersome and error prone, and may cause wasting of network
bandwidth.

Instead , if there is an Serializer with this method :

ByteBuffer serialize(String topic, T data);

This helps to implements a reusable bytes container for  clients to avoid
allocations for each message.

Regards
Bhaskar

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message