samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selina Tech <>
Subject Re: Avro vs Protocol buffer for Samza output
Date Thu, 19 Nov 2015 22:41:54 GMT
Hi, Yi:

      Thanks a lot for your reply with information about Avro schema
      I studied the Avro message on Kafka after your reply, the Avro
message will automatically have  [magic byte][schema id][actual message] after
      Your mentioned " It is a specific way of maintaining compatibility
between producer and consumer in LinkedIn."  I am wondering how this work?
Any "AvroSchemaRegistry" API for Samza,  Kafka or Avro? Do you know any
link for this API or link for code example?
       In another word, If I send messages out with Schema Id1 to topic
"temp", and then later on I add or delete a filed and then the schema
changed. I send the messages out with Schema Id2 to topic "temp". When I
consumer the temp. how can I decode the message? Should I need the schema
Id, How can I get it? Does Kafka, Samza or Avro implement it?


On Wed, Nov 18, 2015 at 5:29 PM, Yi Pan <> wrote:

> Hi, Selina,
> Samza's producer/consumer is highly tunable. You can configure it to use
> ProtocolBufferSerde class if your messages in Kafka are in ProtocolBuf
> format. The use of Avro in Kafka is LinkedIn's choice and does not
> necessarily fit others.
> For the sake of "why LinkedIn uses Avro", here is the biggest reason:
> LinkedIn uses Avro schema registry to ensure that producer/consumer are
> using compatible Avro schema versions. It is a specific way of maintaining
> compatibility between producer and consumer in LinkedIn. ProtoBuf does not
> seem to have the schema registry functionality and requires re-compilation
> to make sure producer and consumer are compatible on the wire-format of the
> message.
> If you have other ways to maintain the compatibility between producer and
> consumers using ProtoBuf, I don't see why you cannot use ProtoBuf in Samza.
> Best,
> -Yi
> On Wed, Nov 18, 2015 at 3:43 PM, Selina Tech <>
> wrote:
> > Dear All:
> >
> >       I need to generate some data by Samza to Kafka and then write to
> > Parquet formate file.  I was asked why I choose Avro type as my Samza
> > output to Kafka instead of Protocol Buffer. Since currently our data on
> > Kafka are all Protocol buffer.
> >       I explained for Avro encoded message -- The encoded size is
> smaller,
> > no extra code compile, implementation easier.  fast to
> > serialize/deserialize and support a lot language.  However some people
> > believe when encoded the Avro message take as much space as Protocol
> > buffer, but with schema, the size could be much bigger.
> >
> >       I am wondering if there are any other advantages make you choose
> Avro
> > as your message type at Kafka?
> >
> > Sincerely,
> > Selina
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message