hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yongqiang He <heyongqiang...@gmail.com>
Subject Re: Pig reading hive columnar rc tables
Date Mon, 30 Nov 2009 20:02:20 GMT
Hi Gerrit Jansen van Vuuren,

You can first open a jira on Pig, and people can discuss it there. Create an
account if u do not have one, and create an issue on
https://issues.apache.org/jira/browse/pig.

Thanks,
Yongqiang
On 11/27/09 2:41 AM, "Gerrit van Vuuren" <gvanvuuren@specificmedia.com>
wrote:

> Hi,
> 
>  
> 
> I've coded a LoadFunc implementation that can read from Hive Columnar RC
> tables, this is needed for a project that I'm working on because all our
> data is stored using the Hive thrift serialized Columnar RC format. I
> have looked at the piggy bank but did not find any implementation that
> could do this. We've been running it on our cluster for the last week
> and have worked out most bugs.
> 
>  
> 
> There are still some improvements to be done but I would need  like
> setting the amount of mappers based on date partitioning. Its been
> optimized so as to read only specific columns and can churn through a
> data set almost 8 times faster with this improvement because not all
> column data is read.
> 
>  
> 
> I would like to contribute the class to the piggybank can you guide me
> in what I need to do?
> 
> I've used hive specific classes to implement this, is it possible to add
> this to the piggy bank build ivy for automatic download of the
> dependencies?
> 
>  
> 
> Thanks,
> 
>  Gerrit Jansen van Vuuren
> 



Mime
View raw message