hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Vladimirov Ivanov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-8597) FsShell's Text command should be able to read avro data files
Date Mon, 03 Sep 2012 10:32:08 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-8597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ivan Vladimirov Ivanov updated HADOOP-8597:
-------------------------------------------

    Attachment: HADOOP-8597.patch

The proposed patch adds the logic to output the content of Avro data files in JSON format.

The implementation does not use the DataFileReadTool class since, as it turned out, the org.apache.avro.tool
package is not currently part of the project's dependencies. As a consequence this allowed
a more memory efficient implementation, which keeps only a constant number of Avro records
in memory.
                
> FsShell's Text command should be able to read avro data files
> -------------------------------------------------------------
>
>                 Key: HADOOP-8597
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8597
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 2.0.0-alpha
>            Reporter: Harsh J
>              Labels: newbie
>         Attachments: HADOOP-8597.patch
>
>
> Similar to SequenceFiles are Apache Avro's DataFiles. Since these are getting popular
as a data format, perhaps it would be useful if {{fs -text}} were to add some support for
reading it, like it reads SequenceFiles. Should be easy since Avro is already a dependency
and provides the required classes.
> Of discussion is the output we ought to emit. Avro DataFiles aren't simple as text, nor
have they the singular Key-Value pair structure of SequenceFiles. They usually contain a set
of fields defined as a record, and the usual text emit, as available from avro-tools via http://avro.apache.org/docs/current/api/java/org/apache/avro/tool/DataFileReadTool.html,
is in proper JSON format.
> I think we should use the JSON format as the output, rather than a delimited form, for
there are many complex structures in Avro and JSON is the easiest and least-work-to-do way
to display it (Avro supports json dumping by itself).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message