avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Avro and Hive
Date Thu, 28 Oct 2010 18:43:11 GMT
Hi all,

I'd seen past emails from Scott and Doug about using Avro as the data  
format for Hive.

This was back in April/May, and I'm wondering about current state of  
the world.

Specifically, what's the recommended approach (& known issues) with  
using Avro files with Hive?

E.g. Scott mentioned that "Avro files should be better performing and  
more compact than sequence files." Has that been proven out?

He also discussed a minor issue with maps - "Their maps however can  
have any intrinsic type as a key (int, long, string, float, double)."

And a more serious issue with unions, though this wouldn't directly  
impact us as we wouldn't be using that feature.

In our situation, we're trying to get the best of both worlds by  
leveraging Hive for analytics, and Cascading for workflow, so having  
one store in HDFS for both would be a significant win.

Thanks for any input!

-- Ken

Ken Krugler
+1 530-210-6378
e l a s t i c   w e b   m i n i n g

View raw message