hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerrit van Vuuren" <gvanvuu...@specificmedia.com>
Subject Pig reading hive columnar rc tables
Date Fri, 27 Nov 2009 10:41:45 GMT


I've coded a LoadFunc implementation that can read from Hive Columnar RC
tables, this is needed for a project that I'm working on because all our
data is stored using the Hive thrift serialized Columnar RC format. I
have looked at the piggy bank but did not find any implementation that
could do this. We've been running it on our cluster for the last week
and have worked out most bugs.


There are still some improvements to be done but I would need  like
setting the amount of mappers based on date partitioning. Its been
optimized so as to read only specific columns and can churn through a
data set almost 8 times faster with this improvement because not all
column data is read.


I would like to contribute the class to the piggybank can you guide me
in what I need to do?

I've used hive specific classes to implement this, is it possible to add
this to the piggy bank build ivy for automatic download of the



 Gerrit Jansen van Vuuren

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message