avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh <jof...@gmail.com>
Subject Re: Alternative to Avro container files for long-term Avro storage
Date Tue, 15 Nov 2016 20:22:10 GMT
Thanks for the replies,
Originally I wanted to have a Kafka topic with multiple schema types, but
Ken's approach sounds like it could work well so I will try out the single
schema approach with a big union type at the root of the schema.


On Tue, Nov 15, 2016 at 4:44 PM, John McClean <jmcclean@gmail.com> wrote:

> One approach is to have separate Kafka topics per schema, which evolve
> with use of a schema registry: https://github.com/
> confluentinc/schema-registry. You'd write to the topic with the schema id
> in metadata. You'd write normal avro storage files, knowing when to split
> them based on the changing schema id in the kafka message.
> On Tue, Nov 15, 2016 at 2:24 AM, Josh <jofo90@gmail.com> wrote:
>> Hi all,
>> I am using a typical Avro->Kafka solution where data is serialized to
>> Avro before it gets written to Kafka and each message is prepended with a
>> schema ID which can be looked up in my schema repository.
>> Now, I want to store the data in long-term storage by writing data from
>> Kafka->S3.
>> I know that the usual way to store Avro in storage is using Avro
>> container files, however a container file can only contain messages encoded
>> with a single Avro schema. In my case, the messages may be encoded with
>> difference schemas, and I need to retain the order of the messages (so that
>> they can be replayed into Kafka, in order). Therefore, a single file in S3
>> needs to contain messages encoded with different schemas and so I can't use
>> Avro container files.
>> I was wondering what would be a good solution to this? What format could
>> I use to store my Avro data, such that a single data file can contain
>> messages encoded with different schemas? Should I store the messages with a
>> prepended schema ID, similar to what I do in Kafka? In that case, how could
>> I separate the messages in the file?
>> Thanks for any advice,
>> Josh

View raw message