avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bram Biesbrouck (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-457) add tools that read/write xml records from/to avro data files
Date Wed, 20 Jan 2016 09:47:40 GMT

    [ https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108297#comment-15108297

Bram Biesbrouck commented on AVRO-457:

Hi [~rdblue] and [~mpigott],

I think I might have found a better approach to this...
To parse XSD schemas, 99% of Java users use [XJC|https://jaxb.java.net/2.2.4/docs/xjc.html]
to convert an XSD to POJOs. The results of this tool are very good, since it's a mature tool.
Because it makes sense to reuse a common POJO codebase to (de)serialize to JSON/XML/AVRO,
this might be a better start to investigate a robust XSD->AVRO parser. Also because raw
XSD parsing/understanding is quite error prone.

Fortunately, a lot of work has been done already. Take a look at [this project|https://github.com/fge/json-schema-core].
It generates a JSON Schema from a POJO class (and recursively all it's members). The result
is a [JSON schema|http://json-schema.org/].
Now the best part: the same developers also wrote [this project|https://github.com/fge/json-schema-avro]
that converts a JSON schema to an AVRO schema. However, the json->avro converter is not
production ready yet. But it has a very nice codebase to start with. [This class|https://github.com/fge/json-schema-avro/blob/master/src/main/java/com/github/fge/jsonschema2avro/AvroWriterProcessor.java]
is a good entry point to its inner workings.

I'm currently trying to find some time to work on it, but it's slow. I successfully managed
to convert the EBUCore XSD schema to a JSON schema though. The next step (JSON->AVRO) is
more difficult I'm afraid. Hence: do the AVRO developers have any experience with converting
JSON schemas into (the more narrow) AVRO schema structure? Would be interesting to investigate
in general because JSON validation is becoming more and more relevant these days.


> add tools that read/write xml records from/to avro data files
> -------------------------------------------------------------
>                 Key: AVRO-457
>                 URL: https://issues.apache.org/jira/browse/AVRO-457
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.8
>            Reporter: Doug Cutting
>              Labels: gsoc
>         Attachments: AVRO-457.patch, AVRO-457.patch, AVRO-457.patch, AVRO-457.patch,
> It might be useful to have command-line tools that can read & write arbitrary XML
data from & to Avro data files.

This message was sent by Atlassian JIRA

View raw message