crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Injecting alternate PType Converter implementations
Date Wed, 24 Apr 2013 22:29:01 GMT
Hey Micah,

It seems like having the AvroKeyConverter use the AvroKey as the return
type instead of AvroWrapper is the easiest way to solve this, since AvroKey
is a subclass of AvroWrapper. That said, I agree, that's a thorny problem.
We're just getting ready for the 0.6.0 release, but I'd be fine to get the
switch in there if that solved this problem for you.

J


On Wed, Apr 24, 2013 at 3:23 PM, Micah Whitacre <mkwhitacre@gmail.com>wrote:

> As an alternative to the standard AvroInput/OutputFormat, I've been
> playing around with how to support alternate Avro file types like
> Trevni[1], which give benefits when we want to only retrieve a subset of
> the Avro object.
>
> Picking one of the implementations
> (AvroTrevniKeyInputFormat/AvroTrevniKeyOutputFormat)[2], I implemented the
> various Source/Target/SourceTarget implementations.  When I started trying
> to test it out (to see if I did any of it right), I hit the issue that the
> AvroKeyConverter only produces AvroWrapper objects and the output format
> requires AvroKey.  So I get ClassCastExceptions CrunchOutputs.write(...)
> method.
>
> Caused by: java.lang.ClassCastException:
> org.apache.avro.mapred.AvroWrapper cannot be cast to
> org.apache.avro.mapred.AvroKey
> at
> org.apache.trevni.avro.mapreduce.AvroTrevniKeyRecordWriter.write(AvroTrevniKeyRecordWriter.java:34)
>  at org.apache.crunch.io.CrunchOutputs.write(CrunchOutputs.java:129)
>
> I was hoping that the target would be able to take any PCollection<?
> extends AvroType> but it looks like I'd need to implement my own PType and
> force consumers to use that just to change the converter to produce AvroKey
> instead.
>
> Is implementing a custom PType the only way to inject an alternate
> converter?  That seems like a high cost on the implementation side and
> forcing a restriction onto others in the pipeline who are generally happy
> with the standard AvroType and shouldn't be burdened with how the data
> might be stored later on in the processing.
>
> Thoughts?
>
> [1] - http://avro.apache.org/docs/current/trevni/spec.html
> [2] -
> http://avro.apache.org/docs/current/api/java/org/apache/trevni/avro/mapreduce/AvroTrevniKeyOutputFormat.html
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message