hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Grimes <frankgrime...@gmail.com>
Subject Combining AVRO files efficiently within HDFS
Date Fri, 06 Jan 2012 15:55:35 GMT
Hi All,

I was wondering if there was an easy way to combing multiple .avro files efficiently.
e.g. combining multiple hours of logs into a daily aggregate

Note that our Avro schema might evolve to have new (nullable) fields added but no fields will
be removed.

I'd like to avoid needing to pull the data down for combining and subsequent "hadoop dfs -put".

Would https://issues.apache.org/jira/browse/HDFS-222 be able to handle that automatically?
FYI, the following seems to indicate that Avro files might be easily combinable: https://issues.apache.org/jira/browse/AVRO-127

Or is an M/R job the best way to go for this?


Frank Grimes
View raw message