avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wai Yip Tung ...@tungwaiyip.info>
Subject Re: embed avro data in an envelop
Date Wed, 07 May 2014 22:20:00 GMT
Thank you. This should work.

I wonder if there is a way in the JSON/avsc to reference or compose the 
schema from other files. So we will have app developer to create 
`my_app.avsc` and the infrastructure developer create `meta.avsc`. How 
do we embed one schema into another? I guess we can do it 
programmatically given we have a infrastructure library that inject the 
meta data already we can join the schema in runtime. I wonder if there 
are other builtin ways to do it also.

Wai Yip



> Eric Wasserman <mailto:ewasserman@247-inc.com>
> Thursday, May 01, 2014 2:41 PM
> We've been happy doing the second approach you mentioned. Our usage 
> looks like (in avro IDL):
>
> record Datum {
> Header header;
> Body body;
> }
>
> where Header contains the meta-data and Body is specific to the 
> particular application.
> Something like:
>
> record Body {
> union { SpecificType1, SpecificType1, ...} body;
> }
>
> one of the nice side effects is that you can take data written with 
> the composite Datum schema
> and let Avro transform it into what you need by specifying a different 
> reader's schema (Note: you also still have to
> give Avro *exactly* the schema the data were originally written with, 
> the "writer's schema", for it to be able to parse the Datum records).
>
> So if all you care about is the application-specific part you use the 
> following reader's schema in your parser:
>
> record HeaderFreeDatum {
> Body body;
> }
>
> Conversely, if you care about the header bits use this as the reader's 
> schema in your parser:
>
> record BodyFreeDatum {
> Header header;
> }
>
> In our use we found significant speedup reading just the headers 
> (YMMV). You can also use Avro-generated classes for the BodyFreeDatum 
> that don't really ever change (as long as the Header doesn't change).
> This lets you revise the schemas for Header and the SpecificTypeX on 
> different schedules.
>
> One final piece of advice: think about how you will handle the 
> inevitable evolution the schemas will undergo.
>
> ________________________________________
> From: Wai Yip Tung <wy@tungwaiyip.info>
> Sent: Tuesday, April 29, 2014 6:14 PM
> To: user@avro.apache.org
> Subject: embed avro data in an envelop
>
> I am looking for some avro usage advice. We have created various schema
> for different applications, say to represent, item id, name, metric,
> etc. On the other hand our infrastructure group want to include some
> meta data on all messages. This should include things like timestamp,
> hostname, etc. This meta data is the same for all application messages.
>
> One way to do it is to have a meta data schema that has timestamp,
> hostname and a binary content field for the application data. This way
> each message need to be decoded twice using two schema.
>
> Another way is to somehow have a composite schema that include both the
> meta data and the application specific data. So each message is just
> decoded once and it automatically include the needed meta data. I wonder
> if this can be done and if it is a good idea. Have other people
> considered similar usage?
>
> Wai Yip
> Wai Yip Tung <mailto:wy@tungwaiyip.info>
> Tuesday, April 29, 2014 6:14 PM
> I am looking for some avro usage advice. We have created various 
> schema for different applications, say to represent, item id, name, 
> metric, etc. On the other hand our infrastructure group want to 
> include some meta data on all messages. This should include things 
> like timestamp, hostname, etc. This meta data is the same for all 
> application messages.
>
> One way to do it is to have a meta data schema that has timestamp, 
> hostname and a binary content field for the application data. This way 
> each message need to be decoded twice using two schema.
>
> Another way is to somehow have a composite schema that include both 
> the meta data and the application specific data. So each message is 
> just decoded once and it automatically include the needed meta data. I 
> wonder if this can be done and if it is a good idea. Have other people 
> considered  similar usage?
>
> Wai Yip

Mime
View raw message