Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 32125 invoked from network); 11 Dec 2009 08:33:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Dec 2009 08:33:42 -0000 Received: (qmail 70829 invoked by uid 500); 11 Dec 2009 08:33:42 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 70740 invoked by uid 500); 11 Dec 2009 08:33:40 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 70722 invoked by uid 99); 11 Dec 2009 08:33:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Dec 2009 08:33:40 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Dec 2009 08:33:39 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1DE31234C04C for ; Fri, 11 Dec 2009 00:33:18 -0800 (PST) Message-ID: <195899962.1260520398107.JavaMail.jira@brutus> Date: Fri, 11 Dec 2009 08:33:18 +0000 (UTC) From: "Gerrit Jansen van Vuuren (JIRA)" To: pig-dev@hadoop.apache.org Subject: [jira] Commented: (PIG-1117) Pig reading hive columnar rc tables In-Reply-To: <445235944.1259684900642.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789170#action_12789170 ] Gerrit Jansen van Vuuren commented on PIG-1117: ----------------------------------------------- Sorry about the @Author tag it was generated by eclipse automatically. I'll take that out and resubmit the patch. I'll change the patch to make 2 releases; One for 0.6 version. And one for the new trunk version that contains the new method signatures for LoadFunc. > Pig reading hive columnar rc tables > ----------------------------------- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature > Reporter: Gerrit Jansen van Vuuren > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, PIG-1117.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC tables, this is needed for a project that I'm working on because all our data is stored using the Hive thrift serialized Columnar RC format. I have looked at the piggy bank but did not find any implementation that could do this. We've been running it on our cluster for the last week and have worked out most bugs. > > There are still some improvements to be done but I would need like setting the amount of mappers based on date partitioning. Its been optimized so as to read only specific columns and can churn through a data set almost 8 times faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in what I need to do? > I've used hive specific classes to implement this, is it possible to add this to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.