Return-Path: X-Original-To: apmail-avro-dev-archive@www.apache.org Delivered-To: apmail-avro-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80683188E5 for ; Wed, 20 Jan 2016 09:47:40 +0000 (UTC) Received: (qmail 8515 invoked by uid 500); 20 Jan 2016 09:47:40 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 8426 invoked by uid 500); 20 Jan 2016 09:47:40 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 8361 invoked by uid 99); 20 Jan 2016 09:47:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2016 09:47:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 0A9182C1F69 for ; Wed, 20 Jan 2016 09:47:40 +0000 (UTC) Date: Wed, 20 Jan 2016 09:47:40 +0000 (UTC) From: "Bram Biesbrouck (JIRA)" To: dev@avro.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AVRO-457) add tools that read/write xml records from/to avro data files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108297#comment-15108297 ] Bram Biesbrouck commented on AVRO-457: -------------------------------------- Hi [~rdblue] and [~mpigott], I think I might have found a better approach to this... To parse XSD schemas, 99% of Java users use [XJC|https://jaxb.java.net/2.2.4/docs/xjc.html] to convert an XSD to POJOs. The results of this tool are very good, since it's a mature tool. Because it makes sense to reuse a common POJO codebase to (de)serialize to JSON/XML/AVRO, this might be a better start to investigate a robust XSD->AVRO parser. Also because raw XSD parsing/understanding is quite error prone. Fortunately, a lot of work has been done already. Take a look at [this project|https://github.com/fge/json-schema-core]. It generates a JSON Schema from a POJO class (and recursively all it's members). The result is a [JSON schema|http://json-schema.org/]. Now the best part: the same developers also wrote [this project|https://github.com/fge/json-schema-avro] that converts a JSON schema to an AVRO schema. However, the json->avro converter is not production ready yet. But it has a very nice codebase to start with. [This class|https://github.com/fge/json-schema-avro/blob/master/src/main/java/com/github/fge/jsonschema2avro/AvroWriterProcessor.java] is a good entry point to its inner workings. I'm currently trying to find some time to work on it, but it's slow. I successfully managed to convert the EBUCore XSD schema to a JSON schema though. The next step (JSON->AVRO) is more difficult I'm afraid. Hence: do the AVRO developers have any experience with converting JSON schemas into (the more narrow) AVRO schema structure? Would be interesting to investigate in general because JSON validation is becoming more and more relevant these days. b. > add tools that read/write xml records from/to avro data files > ------------------------------------------------------------- > > Key: AVRO-457 > URL: https://issues.apache.org/jira/browse/AVRO-457 > Project: Avro > Issue Type: New Feature > Components: java > Affects Versions: 1.7.8 > Reporter: Doug Cutting > Labels: gsoc > Attachments: AVRO-457.patch, AVRO-457.patch, AVRO-457.patch, AVRO-457.patch, ebucore.json > > > It might be useful to have command-line tools that can read & write arbitrary XML data from & to Avro data files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)