avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh <jof...@gmail.com>
Subject Re: Alternative to Avro container files for long-term Avro storage
Date Tue, 15 Nov 2016 12:45:48 GMT
Hi Ken,

Thanks for the reply - that does sound like a good idea, however I don't
think it will work well for me - as I don't have a fixed number of message
types. In my case there could potentially be new message types added every
day and the union could grow to contain hundreds of message types. It also
sounds tricky to manage the union when adding new message types. (i.e.
making sure readers' schemas are updated first)

If there's a nice way to do it, I'd like to find a way that doesn't involve
Avro container files, so that I can maintain a separate Avro schema per
message type.

Josh


On Tue, Nov 15, 2016 at 12:21 PM, Jarrad, Ken <ken.jarrad@citi.com> wrote:

> Josh, I use method createUnion on class org.apache.avro.Schema.
>
>
>
> The mixed message types then have the union as their common type and are
> thus homogeneous.
>
>
>
> Yours sincerely,
>
> Ken Jarrad.
>
>
>
> *From:* Josh [mailto:jofo90@gmail.com]
> *Sent:* 15 November 2016 10:24
> *To:* user@avro.apache.org
> *Subject:* Alternative to Avro container files for long-term Avro storage
>
>
>
> Hi all,
>
>
>
> I am using a typical Avro->Kafka solution where data is serialized to Avro
> before it gets written to Kafka and each message is prepended with a schema
> ID which can be looked up in my schema repository.
>
>
>
> Now, I want to store the data in long-term storage by writing data from
> Kafka->S3.
>
>
>
> I know that the usual way to store Avro in storage is using Avro container
> files, however a container file can only contain messages encoded with a
> single Avro schema. In my case, the messages may be encoded with difference
> schemas, and I need to retain the order of the messages (so that they can
> be replayed into Kafka, in order). Therefore, a single file in S3 needs to
> contain messages encoded with different schemas and so I can't use Avro
> container files.
>
>
>
> I was wondering what would be a good solution to this? What format could I
> use to store my Avro data, such that a single data file can contain
> messages encoded with different schemas? Should I store the messages with a
> prepended schema ID, similar to what I do in Kafka? In that case, how could
> I separate the messages in the file?
>
>
>
> Thanks for any advice,
>
> Josh
>

Mime
View raw message