avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Questions re integrating Avro into Cascading process
Date Wed, 21 Apr 2010 22:22:50 GMT
Ken Krugler wrote:
> One open issue - it would be great to be able to set metadata in the 
> headers of the resulting Avro files. But it wasn't obvious how to do 
> that, given our (intentionally) arms-length approach via the use of the 
> Avro mapred code.
> 
> One idea would be to have job conf values using keys prefixed with 
> avro.metadata.xxx, and the Avro mapred support could automagically use 
> that when creating the file. But this would break our goal of using 
> unmodified Avro source, so I'm curious whether support for setting the 
> file metadata would also be useful for the standard (Hadoop) use of Avro 
> for an output format, and if so, whether there was a better approach.

Embedding the metadata in the configuration seems like a good approach. 
  Please file a Jira issue for this and attach a patch.

AvroOutputFormat can add properties named avro.mapred.output.metadata.*. 
  We'll have to enumerate all properties in the job and test for this 
prefix, since Configuration is a HashMap, but the alternative of 
encoding the metadata map in a single configuration value seems no more 
attractive.

Note that https://issues.apache.org/jira/browse/HADOOP-6420 added 
support for adding maps to configuration, but the extracted map cannot 
be enumerated, so could not be added to the DataFileWriter's metadata. 
Also, this feature is perhaps slated for removal as a part of 
https://issues.apache.org/jira/browse/HADOOP-6698, but its code might 
prove useful as a starting point.

Doug

Mime
View raw message