crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Injecting alternate PType Converter implementations
Date Wed, 24 Apr 2013 22:23:51 GMT
As an alternative to the standard AvroInput/OutputFormat, I've been playing
around with how to support alternate Avro file types like Trevni[1], which
give benefits when we want to only retrieve a subset of the Avro object.

Picking one of the implementations
(AvroTrevniKeyInputFormat/AvroTrevniKeyOutputFormat)[2], I implemented the
various Source/Target/SourceTarget implementations.  When I started trying
to test it out (to see if I did any of it right), I hit the issue that the
AvroKeyConverter only produces AvroWrapper objects and the output format
requires AvroKey.  So I get ClassCastExceptions CrunchOutputs.write(...)
method.

Caused by: java.lang.ClassCastException: org.apache.avro.mapred.AvroWrapper
cannot be cast to org.apache.avro.mapred.AvroKey
at
org.apache.trevni.avro.mapreduce.AvroTrevniKeyRecordWriter.write(AvroTrevniKeyRecordWriter.java:34)
at org.apache.crunch.io.CrunchOutputs.write(CrunchOutputs.java:129)

I was hoping that the target would be able to take any PCollection<?
extends AvroType> but it looks like I'd need to implement my own PType and
force consumers to use that just to change the converter to produce AvroKey
instead.

Is implementing a custom PType the only way to inject an alternate
converter?  That seems like a high cost on the implementation side and
forcing a restriction onto others in the pipeline who are generally happy
with the standard AvroType and shouldn't be burdened with how the data
might be stored later on in the processing.

Thoughts?

[1] - http://avro.apache.org/docs/current/trevni/spec.html
[2] -
http://avro.apache.org/docs/current/api/java/org/apache/trevni/avro/mapreduce/AvroTrevniKeyOutputFormat.html

Mime
View raw message