avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Pigott <mpigott.subscripti...@gmail.com>
Subject Re: Need help transforming Avro schemas
Date Tue, 19 Aug 2014 14:04:48 GMT
Hi Juan,
    That sounds really complex.  Would you instead be able to build or
retrieve the original Avro Schema objects, and then build a new Schema from
its definition?  For my work on transforming XML to Avro and back[1], I
wrote a comparison tool to confirm that two Avro Schemas are equivalent by
recursively descending through both schemas[2].  Perhaps you can use
something similar to build a transformed Avro schema in memory, by applying
your transformations on the fly?

Good luck!
Mike

[1] https://issues.apache.org/jira/browse/AVRO-457
[2]
https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/test/java/org/apache/avro/xml/UtilsForTests.java


On Tue, Aug 19, 2014 at 2:23 AM, Juan Rodríguez Hortalá <
juan.rodriguez.hortala@gmail.com> wrote:

> Hi list,
>
> I'm working on a project in Java where we have a DSL working on
> GenericRecord objects, over which we define record transformation
> operations like projections, filters and so. This implies that the avro
> schema of the records evolves by adding and deleting record fields. As a
> result the avro schemas used are different in each program depending on the
> operations used. Hence I have to define avro schema transformations, and
> generate new schemas as modifications of other schemas. For that the avro
> schema builder classes are only useful for the starting schema, and so does
> a pojo to schema mapping like avro-jackson. The main problem I face is that
> in avro by design "schema objects are logically immutable", as stated in
> the documentation. So far I've taken the way of converting the schema to
> string, parsing it with jackson and manipulate it's representation as
> JsonNode, and then parsing it back to Avro. In that latter step I sometimes
> have problems because avro records are named, and anonymous records are not
> always legal in complete schemas; or because the same record name cannot be
> used twice in two child fields of a parent record. I was then thinking in
> using generated schema names, with an increasing ID or a random UUID.
> Anyway my question is, the approach I'm describing is correct?,  are you
> aware of some library for creating new avro schemas by manipulating an
> input schema? Maybe that capabilities are already present in avro's Java
> API but I haven't noticed.
>
> Any help with be welcome. Thanks a lot in advance
>
> Greetings,
>
> Juan Rodríguez Hortalá
>

Mime
View raw message