hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-815) Add AvroInputFormat and AvroOutputFormat so that hadoop can use Avro Serialization
Date Thu, 14 Jan 2010 04:39:55 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aaron Kimball updated MAPREDUCE-815:

    Attachment: MAPREDUCE-815.patch

Attaching a patch that provides AvroInputFormat/AvroOutputFormat.

AvroInputFormat allows you to set its input schema in the job configuration. It provides static
methods for this functionality. Depending on the input serialization metadata it can choose
to deserialize to generic, reflect, or specific-based classes. 

This patch includes unit tests for both of these classes.

I have also extended the jobdata API to allow you to set output serialization metadata (vs.
simple class-name-only metadata) in the same fashion as MAPREDUCE-1126 allowed you to set
intermediate serialization metadata. This deprecates the old methods like {{JobConf.setOutputKeyClass()}}.
Note that now the PipesMapRunner/PipesReducer, MapFileOutputFormat, and SequenceFileOutputFormat
rely on these deprecated APIs. MAPREDUCE-1360 will require a Hadoop-core-project JIRA that
allows SequenceFile to handle non-class-based serialization; that will update at least the
SequenceFile IF/OF APIs. Handling Pipes is a separate issue.

This cannot be submitted to the patch queue until a small change is made to the Hadoop-core
API (issue is linked), and Hadoop is upgraded across the board to Avro 1.3. I'll mark this
patch-available when that happens.

> Add AvroInputFormat and AvroOutputFormat so that hadoop can use Avro Serialization
> ----------------------------------------------------------------------------------
>                 Key: MAPREDUCE-815
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-815
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Ravi Gummadi
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-815.patch
> MapReduce needs AvroInputFormat similar to other InputFormats like TextInputFormat to
be able to use avro serialization in hadoop. Similarly AvroOutputFormat is needed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message