avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: HUG talk on PTD/Avro
Date Fri, 23 Apr 2010 20:31:43 GMT
Ken Krugler wrote:
> 3. It would be great to get feedback on both the Avro Cascading scheme 
> (http://github.com/bixolabs/cascading.avro) and the content we're 
> currently saving in the Avro file.

Overall it looks fine to me.

What do you think of https://issues.apache.org/jira/browse/AVRO-513? 
Would that make your life much easier?

It might be more efficient, instead of reading Avro generic data and 
converting it to your desired representation, to subclass 
GenericDatumReader and override #readString(), #readBytes(), #readMap(), 
and #readArray().  Similarly for DatumWriter.  But we'd then also need 
to permit one to configure AvroRecordReader to use a different 
DatumReader implementation.  We might, e.g., add a 
DataRepresentationFactory interface:

interface DataRepresentation<T> {
   DatumReader<T> createDatumReader();
   DatumWriter<T> createDatumWriter();
}

Then we could replace AvroJob#setInputSpecific() and #setInputGeneric() 
with #setInputRepresentation(Class<DataRepresentation> rep, Schema s). 
You could subclass GenericDatumReader & Writer and implement a 
DataRepresentation that returns these.

Worth it?

Doug

Mime
View raw message