avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AVRO-662) Java: Add InputFormat for SequenceFiles using Reflect API
Date Mon, 13 Sep 2010 18:42:39 GMT

     [ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Doug Cutting updated AVRO-662:

    Attachment: AVRO-662.patch

Here's a patch that adds this feature.  A SequenceFileInputFormat is added that presents sequence
file data in a form compatible with Avro's MapReduce API.  In particular, primitive Writable
types (LongWritable, Text, etc.) are converted to corresponding Avro types (Long, CharSequence,
etc.), while reflection is used to infer a schema for complex Writables.  The Writable implementation
must be available at runtime, of course.

I also abstracted a FileReader interface and added a SequenceFileReader implementation.  This
permits easier integration of SequenceFile and other formats into Avro tools.  For example,
it would now be a simple matter to extend Avro's 'tojson' command to also dump SequenceFile
data as JSON.

> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>                 Key: AVRO-662
>                 URL: https://issues.apache.org/jira/browse/AVRO-662
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.1
>         Attachments: AVRO-662.patch
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java
mapreduce program.  Once the reflect, specific and generic representations are fully compatible
(AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation.
 AvroOutputFormat could also be changed to accept such reflected data.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message