samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <nickpa...@gmail.com>
Subject Re: Avro vs Protocol buffer for Samza output
Date Thu, 19 Nov 2015 01:57:08 GMT
Hi,  Selina,


On Wed, Nov 18, 2015 at 5:43 PM, Selina Tech <swucareer99@gmail.com> wrote:

> Hi, Yi:
>      Thanks for your reply. Do you mean there is no advantage of Avro
> message vs Protocol buffer message on Kafka except  Avro schema registry?
>
>
Well, be careful about interpreting my words in this way. I did not do a
thorough survey to compare those two. I just said that the reason for
LinkedIn to choose Avro is the manageability of schema compatibility among
producers and consumers. There could be other pros and cons between those
two serialization formats and you should not quote my words as "no
advantage of Arvo vs ProtoBuf".


>      BTW, do you know how Kafka implement the Avro message? Does each Avro
> message include the schema or not?  The size of Avro message is a big
> concern for me now.
>
> Sincerely,
> Selina
>
>
>
> On Wed, Nov 18, 2015 at 5:29 PM, Yi Pan <nickpan47@gmail.com> wrote:
>
> > Hi, Selina,
> >
> > Samza's producer/consumer is highly tunable. You can configure it to use
> > ProtocolBufferSerde class if your messages in Kafka are in ProtocolBuf
> > format. The use of Avro in Kafka is LinkedIn's choice and does not
> > necessarily fit others.
> >
> > For the sake of "why LinkedIn uses Avro", here is the biggest reason:
> > LinkedIn uses Avro schema registry to ensure that producer/consumer are
> > using compatible Avro schema versions. It is a specific way of
> maintaining
> > compatibility between producer and consumer in LinkedIn. ProtoBuf does
> not
> > seem to have the schema registry functionality and requires
> re-compilation
> > to make sure producer and consumer are compatible on the wire-format of
> the
> > message.
> >
> > If you have other ways to maintain the compatibility between producer and
> > consumers using ProtoBuf, I don't see why you cannot use ProtoBuf in
> Samza.
> >
> > Best,
> >
> > -Yi
> >
> > On Wed, Nov 18, 2015 at 3:43 PM, Selina Tech <swucareer99@gmail.com>
> > wrote:
> >
> > > Dear All:
> > >
> > >       I need to generate some data by Samza to Kafka and then write to
> > > Parquet formate file.  I was asked why I choose Avro type as my Samza
> > > output to Kafka instead of Protocol Buffer. Since currently our data on
> > > Kafka are all Protocol buffer.
> > >       I explained for Avro encoded message -- The encoded size is
> > smaller,
> > > no extra code compile, implementation easier.  fast to
> > > serialize/deserialize and support a lot language.  However some people
> > > believe when encoded the Avro message take as much space as Protocol
> > > buffer, but with schema, the size could be much bigger.
> > >
> > >       I am wondering if there are any other advantages make you choose
> > Avro
> > > as your message type at Kafka?
> > >
> > > Sincerely,
> > > Selina
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message