avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John McClean <jmccl...@gmail.com>
Subject Re: Alternative to Avro container files for long-term Avro storage
Date Tue, 15 Nov 2016 16:44:50 GMT
One approach is to have separate Kafka topics per schema, which evolve with
use of a schema registry: https://github.com/confluentinc/schema-registry.
You'd write to the topic with the schema id in metadata. You'd write normal
avro storage files, knowing when to split them based on the changing schema
id in the kafka message.

On Tue, Nov 15, 2016 at 2:24 AM, Josh <jofo90@gmail.com> wrote:

> Hi all,
> I am using a typical Avro->Kafka solution where data is serialized to Avro
> before it gets written to Kafka and each message is prepended with a schema
> ID which can be looked up in my schema repository.
> Now, I want to store the data in long-term storage by writing data from
> Kafka->S3.
> I know that the usual way to store Avro in storage is using Avro container
> files, however a container file can only contain messages encoded with a
> single Avro schema. In my case, the messages may be encoded with difference
> schemas, and I need to retain the order of the messages (so that they can
> be replayed into Kafka, in order). Therefore, a single file in S3 needs to
> contain messages encoded with different schemas and so I can't use Avro
> container files.
> I was wondering what would be a good solution to this? What format could I
> use to store my Avro data, such that a single data file can contain
> messages encoded with different schemas? Should I store the messages with a
> prepended schema ID, similar to what I do in Kafka? In that case, how could
> I separate the messages in the file?
> Thanks for any advice,
> Josh

View raw message