hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Re: Pig reading hive columnar rc tables
Date Mon, 30 Nov 2009 20:18:27 GMT
That's awesome, I've been itching to do that but never got around to it..
Garrit, do you have any benchmarks on read speeds?

I don't know about putting this in piggybank, as it carries with it pretty
significant dependencies, increasing the size of the jar and making it
difficult for users to don't need it to build piggybank in the first place.
We might want to consider some other contrib for it -- maybe a "misc"
contrib that would have indivudual ant targets for these kinds of
compatibility submissions?


On Mon, Nov 30, 2009 at 3:09 PM, Olga Natkovich <olgan@yahoo-inc.com> wrote:

> Hi Garrit,
> It would be great if you could contribute the code. The process is
> pretty simple:
> - Open a JIRA that describes what the loader does and that you would
> like to contribute it to the Piggybank.
> - Submit the patch that contains the loader. Make sure it has unit tests
> and javadoc.
> On this is done, one of the committers will review and commit the patch.
> More details on how to contribute are in
> http://wiki.apache.org/pig/PiggyBank.
> Olga
> -----Original Message-----
> From: Gerrit van Vuuren [mailto:gvanvuuren@specificmedia.com]
> Sent: Friday, November 27, 2009 2:42 AM
> To: pig-dev@hadoop.apache.org
> Subject: Pig reading hive columnar rc tables
> Hi,
> I've coded a LoadFunc implementation that can read from Hive Columnar RC
> tables, this is needed for a project that I'm working on because all our
> data is stored using the Hive thrift serialized Columnar RC format. I
> have looked at the piggy bank but did not find any implementation that
> could do this. We've been running it on our cluster for the last week
> and have worked out most bugs.
> There are still some improvements to be done but I would need  like
> setting the amount of mappers based on date partitioning. Its been
> optimized so as to read only specific columns and can churn through a
> data set almost 8 times faster with this improvement because not all
> column data is read.
> I would like to contribute the class to the piggybank can you guide me
> in what I need to do?
> I've used hive specific classes to implement this, is it possible to add
> this to the piggy bank build ivy for automatic download of the
> dependencies?
> Thanks,
>  Gerrit Jansen van Vuuren

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message