avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xu Yang <yang.jim...@gmail.com>
Subject Re: Difference between Avro message vs Avro Object Container files?
Date Wed, 24 May 2017 19:34:55 GMT
Hello Niels,

I guess you are talking about the schema registry service from
kafka/Confluent in your Avro message part?

The Schema Registry (SR) service defined its own avro format as "| 1 byte
magic byte | 4 byte *schema id* indicate the schema stored in SR service|
actual Avro datum |"
http://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html#wire-format

While the Avro message described inside Avro document is something like

>
>    - a series of *buffers*, where each buffer consists of:
>       - a four-byte, big-endian *buffer length*, followed by
>       - that many bytes of *buffer data*.
>
> https://avro.apache.org/docs/1.8.1/spec.html#Message Framing

And it didn't mentioned the schema at all.

I am not convinced these 2 are the same thing and I am not quite sure that
does the avro community and kafka/confluent community has got an agreement
on this.

Can you or someone else explain more about this?

Really Appreciate!
Thanks,
Yang


2017-05-16 11:02 GMT-04:00 Niels Basjes <Niels@basjes.nl>:

> Hi,
>
> A key thing with Avro is that in order to deserialize a record from the
> byte array back into a usable form you need the schema that was used to
> create the bytes in the first place.
>
> An Avro file is essentially a (large) set of records that all adhere to
> the same schema.
> In such a file you will find the complete schema and for each of the
> records the binary representation of that record.
> This is possible way storing records that can then be used for batch
> processing and because the schema is part of the file you can always read
> all records in that file.
>
> The Avro message format was created for the streaming usecase.
> If you want to stream records into Kafka (where they will persist until
> the TTL expires) then you need a way to know the schema ... for each record.
> A schema may change over time we need to record the schema with EACH
> record.
> Because the schema can be quite big (several KiB is common) you do not
> want to store the same schema with every message.
> So for the Message format you will find the ID of the schema in
> conjunction with the actual record.
> Looking at the API there is a system included behind which you can create
> a database for all versions of all your schemas.
>
> Does this clarify it for you?
>
> Niels Basjes
>
>
> On Tue, May 16, 2017 at 8:30 AM, kant kodali <kanth909@gmail.com> wrote:
>
>> Hi All,
>>
>> I am new to Avro so I was wondering what is the difference between Avro
>> message vs Avro Object Container files? Are they related at all? What are
>> the use cases for each?
>>
>> Thanks!
>>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Mime
View raw message