avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-534) AvroRecordReader (org.apache.avro.mapred) should support a JobConf-given schema
Date Fri, 20 Aug 2010 00:46:17 GMT

    [ https://issues.apache.org/jira/browse/AVRO-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900541#action_12900541
] 

Doug Cutting commented on AVRO-534:
-----------------------------------

Harsh, will you have a chance to work on a test for this?  Please tell me if you'd like me
to help.

> AvroRecordReader (org.apache.avro.mapred) should support a JobConf-given schema
> -------------------------------------------------------------------------------
>
>                 Key: AVRO-534
>                 URL: https://issues.apache.org/jira/browse/AVRO-534
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.4.0
>         Environment: ArchLinux, JAVA 1.6, Apache Hadoop (0.20.2), Apache Avro (trunk
-- 1.4.0 SNAPSHOT), Using Avro Generic API (JAVA)
>            Reporter: Harsh J Chouraria
>            Priority: Trivial
>             Fix For: 1.4.0
>
>         Attachments: avro.mapreduce.r1.diff
>
>
> Consider an Avro File of a single record type with about 70 fields in the order (str,
str, str, long, str, double, [lets take only first 6 into consideration] ...).
> To pass this into a simple MapReduce job I do: AvroInputFormat.addInputPath(...) and
it works well with an IdentityMapper.
> Now I'd like to read only three fields, say fields 0, 1 and 3 so I give the special schema
with my 3 fields as (str (0), str (1), long(2)) using AvroJob.setInputGeneric(..., mySchema).
This leads to a failure of the mapreduce job since the Avro record reader reads the file for
its entire schema (of 70 fields) and tries to convert my given 'long' field to 'str' as is
at the index 2 of the actual schema (meaning its using the actual schema embedded into the
file, not what I supplied!).
> The AvroRecordReader must support reading in the schema specified by the user using AvroJob.setInputGeneric.
> I've written a patch for it to do the same but am not sure if its actually the solution
(MAP_OUTPUT_SCHEMA use?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message