avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anirudha Jadhav <aniru...@nyu.edu>
Subject Avro Tranformation Language / Avro Graph
Date Mon, 31 Aug 2015 19:46:20 GMT
Hello All,

I would like to introduce you all to a project which we have been working
on using Avro and get some feedback.

1. AvroGraph
We have created an avro to graphml serializer / deserializer. This allows
us to visualize avro schemas in a graph to understand the relation between
all the data points. This will later lead to creation of lineage graphs
among other things
- Implementation
  o similar to json serializer / deserializer
  o Apache Tinkerpop is used as a graph library and can be used to persist
to a variety of graph stores.
  o support for scheme evolution between multiple version of the avro
  o lot of unit tests and documentation

2. Avro Transformation Language
This is YAML based specification that will transform a data in a source
schema to a target schema. For this we introduce a "transform node" to join
the two schemas
 - The following operations can be done during the source to target data
   o Copy source leaves to target leaves
   o Copy source parent nodes to target parent nodes, only if the sub
graphs have the same structure.
   o Concatenate source nodes and copy to a target node
   o User-defined operations on the transforms
   o Extract certain leaves from the source and call an external end point
for data manipulation  eg. Spark / Http

Let me know how/if these components would benefit the apache avro project
and accordingly we would like to contribute it to the apache avro project.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message