avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-1439) MultipleInputs equivalent for Avro MR
Date Wed, 15 Jan 2014 11:56:20 GMT

     [ https://issues.apache.org/jira/browse/AVRO-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Harsh J updated AVRO-1439:

    Attachment: AVRO-1439.patch

Here is a functional patch for the {{mapred}} (Old) APIs with a reflect based test case that
illustrates a sample join operation.

I've not yet delved into the {{mapreduce}} (New) APIs, but it would be implemented in nearly
the same way.

Any comments on the approach before I begin work on the {{mapreduce}} equivalent?

Here are some implementation points:
- Only works for Specific and Reflect based MR that use {{mapred.AvroInputFormat}} and {{mapred.AvroMapper}}/{{mapred.AvroReducer}}
-- Only schema and map classes can be configured per path.
-- No input format class flexibility like its Apache Hadoop equivalent.
- Passing a schema when adding an input path is mandatory.
- Passing a mapper class when adding an input path is also mandatory.

> MultipleInputs equivalent for Avro MR
> -------------------------------------
>                 Key: AVRO-1439
>                 URL: https://issues.apache.org/jira/browse/AVRO-1439
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.8.0
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Minor
>         Attachments: AVRO-1439.patch
> We have MultipleOutputs-like functionality for Avro today, but lack a MultipleInputs
which would make pure-MR joins possible to do with Specific/Reflect Avro MR.

This message was sent by Atlassian JIRA

View raw message