hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerrit Jansen van Vuuren (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
Date Mon, 18 Jan 2010 21:20:54 GMT

    [ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801959#action_12801959
] 

Gerrit Jansen van Vuuren commented on PIG-1117:
-----------------------------------------------

Thanks, 

This ISSUE will affect the build if the current uploads from the apache site change.
I basically use: 

<get verbose="true" src="${apache.dist.site}/${hive.groupId}/${hive.artifactId}/${hive.artifactId}-${hive.version}/${hive.artifactId}-${hive.version}-hadoop-${hadoop.version}-bin.tar.gz"
dest="lib-hivedeps/${hive.artifactId}-${hive.version}-hadoop-${hadoop.version}-bin.tar.gz"/>

to download the hive dependencies. This is not very pretty I aggree. I have just sent an email
to the hive-dev email to ask for permission to make a maven upload request for the hive jars.
 This will make the dependencies work with ivy in the standard build and look much better.


> Pig reading hive columnar rc tables
> -----------------------------------
>
>                 Key: PIG-1117
>                 URL: https://issues.apache.org/jira/browse/PIG-1117
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Gerrit Jansen van Vuuren
>            Assignee: Gerrit Jansen van Vuuren
>             Fix For: 0.7.0
>
>         Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, PIG-1117.patch,
PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC tables, this
is needed for a project that I'm working on because all our data is stored using the Hive
thrift serialized Columnar RC format. I have looked at the piggy bank but did not find any
implementation that could do this. We've been running it on our cluster for the last week
and have worked out most bugs.
>  
> There are still some improvements to be done but I would need  like setting the amount
of mappers based on date partitioning. Its been optimized so as to read only specific columns
and can churn through a data set almost 8 times faster with this improvement because not all
column data is read.
> I would like to contribute the class to the piggybank can you guide me in what I need
to do?
> I've used hive specific classes to implement this, is it possible to add this to the
piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message