samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selina Tech <swucaree...@gmail.com>
Subject Re: Sample code or tutorial for writing/reading Avro type message in Samza
Date Fri, 20 Nov 2015 20:44:25 GMT
Hi, Luis:

       Thanks a lot for your reply!.

Sincerely,
Selina

On Fri, Nov 20, 2015 at 12:09 PM, Luis Casillas <
luis.casillas@progressfin.com> wrote:

>
> We haven’t seriously considered Protocol Buffers.  In general the tools
> we’re interested in have better support for Avro than for protobuf; Avro
> was designed for storing data in big-data storage like HDFS, and many tools
> for analyzing such data have taken it up.  For example Hive comes with Avro
> support built in.
>
> More generally, we like the design choices that Avro has made:
>
> 1. Self-describing container files
> 2. Easy convertibility to/from JSON
> 3. Not tightly tied to code generation
>
>
>
> We’ve experienced these downsides, however:
>
> 1. We’ve been bit hard by buggy Avro library versions.  You want to stick
> to the latest one.
> 2. Hadoop ships with such an older, buggy version of Avro, and it is a
> major pain to work around it.
> 3. Avro's “one definition file = one schema = one record type” assumption
> causes us some trouble.
>
>
> On 11/20/15, 2:47 AM, "Selina Tech" <swucareer99@gmail.com> wrote:
>
> >Hi, Luis:
> >        Thanks a lot for your detail reply with your codes and link of
> Avro
> >schema registry.
> >        May I have a question, have you considered protocol buffer as your
> >message type?
> >
> >Sincerely,
> >Selina
> >
> >
> >On Thu, Nov 19, 2015 at 2:22 PM, Luis Casillas <
> >luis.casillas@progressfin.com> wrote:
> >
> >>
> >> I did a Samza proof of concept project recently and I ended up writing
> >> this code:
> >>
> >> https://gist.github.com/ldcasillas-progreso/871af3c1a1790be975fd
> >>
> >> In the end, however, I switched the project from Avro to JSON.  The
> issue
> >> is that Avro is designed to work with its self-describing container file
> >> format, which embeds the schema used to write the records in the file.
> >> Avro’s schema evolution features rely on this embedded schema; when the
> >> embedded schema and the reader’s schema are not equal, Avro uses its
> >> special rules to translate the old data to the new schema.
> >>
> >> But when you’re working with Kafka/Samza, there is no container file.
> >> Therefore, none of the schema evolution tools work.  Therefore, if you
> >> change your Avro schema, you likely won’t be able to read any of the old
> >> messages again.
> >>
> >> There’s a Kafka Avro schema registry project that aims to fix this:
> >>
> >> https://github.com/confluentinc/schema-registry
> >>
> >> I tried it but the released version just was not mature enough—which is
> >> why I ended up using JSON.  But I did write a Serde that encodes/decodes
> >> the Avro objects in JSON:
> >>
> >> https://gist.github.com/ldcasillas-progreso/3611d40d2833aa62c1b3
> >>
> >> Hope this helps.
> >>
> >>
> >>
> >>
> >>
> >> On 11/17/15, 12:32 AM, "Selina Tech" <swucareer99@gmail.com> wrote:
> >>
> >> >Dear All:
> >> >     Do you know where I can find the tutorial or sample code for
> writing
> >> >Avro type message to Kafka and reading Avro type message from Kafka in
> >> >Samza?
> >> >      I am wondering how should I serialized GenericRecord to byte and
> >> >deserialized it?
> >> >     Your comments/suggestion are highly appreciated.
> >> >
> >> >Sincerely,
> >> >Selina
> >>
> >>
> >> -----------
> >> This message and any files or text attached to it are intended only for
> >> the recipients named above, and contain information that is
> confidential or
> >> privileged. If you are not an intended recipient, you must not read,
> copy,
> >> use or disclose this communication. Please also notify the sender by
> >> replying to this message, and then delete all copies of it from your
> system.
> >>
> >> Este mensaje y cualquier archivo o texto adjunto es dirigido solamente a
> >> los destinatarios especificados en el encabezado y contiene información
> >> confidencial y/o privilegiada. Si usted no es el destinatario no deberá
> >> leer, copiar, usar o divulgar el contenido. Por favor notifique al
> >> remitente, respondiendo a esté mensaje y elimine todas las copias del
> mismo
> >> de su sistema.
> >>
>
>
> -----------
> This message and any files or text attached to it are intended only for
> the recipients named above, and contain information that is confidential or
> privileged. If you are not an intended recipient, you must not read, copy,
> use or disclose this communication. Please also notify the sender by
> replying to this message, and then delete all copies of it from your system.
>
> Este mensaje y cualquier archivo o texto adjunto es dirigido solamente a
> los destinatarios especificados en el encabezado y contiene información
> confidencial y/o privilegiada. Si usted no es el destinatario no deberá
> leer, copiar, usar o divulgar el contenido. Por favor notifique al
> remitente, respondiendo a esté mensaje y elimine todas las copias del mismo
> de su sistema.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message