avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Zeyliger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-672) Convert JSON Text Input to Avro Tool
Date Wed, 06 Oct 2010 00:02:33 GMT

    [ https://issues.apache.org/jira/browse/AVRO-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918341#action_12918341

Philip Zeyliger commented on AVRO-672:

I like the idea of having tools that manipulate "traditional" data formats into avro records,
including guessing at the schema.  CSV and TSV and one-json-per-line are obvious candidates

> Convert JSON Text Input to Avro Tool
> ------------------------------------
>                 Key: AVRO-672
>                 URL: https://issues.apache.org/jira/browse/AVRO-672
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Ron Bodkin
>         Attachments: AVRO-672.patch, AVRO-672.patch
> The attached patch allows reading a JSON-formatted text file in, converting to a conforming
Avro text file, emitting one record per line, e.g., it can read this input file:
> {"intval":12}
> {"intval":-73,"strval":"hello, there!!"}
> with this schema:
> { "type":"record", "name":"TestRecord", "fields": [ {"name":"intval","type":"int"}, {"name":"strval","type":["string",
> returning valid Avro. This is different than the DataFileWriteTool, which would read
in the following internal encoding:
> {"intval":12,"strval":null}
> {"intval":-73,"strval":{"string":"hello, there!!"}}
> In general, the internal encodings used by Avro aren't natural when reading in JSON text
that appears in the wild. Likewise, this utility allows changing invalid Avro identifier characters
into an underscore, again to tolerate JSON that wasn't designed to be readable by Avro.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message