avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AVRO-662) Java: Add InputFormat for SequenceFiles using Reflect API
Date Mon, 13 Sep 2010 18:42:39 GMT

     [ https://issues.apache.org/jira/browse/AVRO-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Doug Cutting updated AVRO-662:
------------------------------

    Attachment: AVRO-662.patch

Here's a patch that adds this feature.  A SequenceFileInputFormat is added that presents sequence
file data in a form compatible with Avro's MapReduce API.  In particular, primitive Writable
types (LongWritable, Text, etc.) are converted to corresponding Avro types (Long, CharSequence,
etc.), while reflection is used to infer a schema for complex Writables.  The Writable implementation
must be available at runtime, of course.

I also abstracted a FileReader interface and added a SequenceFileReader implementation.  This
permits easier integration of SequenceFile and other formats into Avro tools.  For example,
it would now be a simple matter to extend Avro's 'tojson' command to also dump SequenceFile
data as JSON.

> Java: Add InputFormat for SequenceFiles using Reflect API
> ---------------------------------------------------------
>
>                 Key: AVRO-662
>                 URL: https://issues.apache.org/jira/browse/AVRO-662
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.1
>
>         Attachments: AVRO-662.patch
>
>
> It would be useful to be able to read SequenceFile-based data into an Avro-based Java
mapreduce program.  Once the reflect, specific and generic representations are fully compatible
(AVRO-638) then a RecordReader for SequenceFiles could be added that uses Avro's reflect representation.
 AvroOutputFormat could also be changed to accept such reflected data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message