Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 86963 invoked from network); 30 Nov 2009 23:39:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Nov 2009 23:39:55 -0000 Received: (qmail 33425 invoked by uid 500); 30 Nov 2009 23:39:55 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 33394 invoked by uid 500); 30 Nov 2009 23:39:54 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 33384 invoked by uid 99); 30 Nov 2009 23:39:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Nov 2009 23:39:54 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dvryaboy@gmail.com designates 74.125.78.24 as permitted sender) Received: from [74.125.78.24] (HELO ey-out-2122.google.com) (74.125.78.24) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Nov 2009 23:39:44 +0000 Received: by ey-out-2122.google.com with SMTP id 4so953380eyf.23 for ; Mon, 30 Nov 2009 15:39:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=pAVFsWjzvlPqqbTtlKhd1PW7hy1Dx0VkV8r6yspQ6ug=; b=EAPrPcHR8nTA/07By/YwRO+boGj9jEmlhRn7gW/a48JSOZFy5EKN22IlLQkeWR0jRY YBkvDxAgPGOCYOgi1xqEltVo0V/ogQg+w8AV2d4cRIuP5iVawnE6SDT33He/zQkrxeAp 5qRwBRrFh0WegNltMQCAlvXBGwnwNu7jOyr/c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=al6K62h/pLia97yc17CPdTk4NvTI5sN+JKyoRy9C8Gk22KXFHI8kQLfICz9FCCIxHR JEER48tl3DMZLYSeulkXxD0Ce92U0cUbP1l4BHoC6B/wdkwyNoyUdLyNifuNdFvlQpo7 7un7lBh63+Hkqa76zQpkf05TQ9o5wH13+uge0= MIME-Version: 1.0 Received: by 10.216.88.146 with SMTP id a18mr160816wef.56.1259624363853; Mon, 30 Nov 2009 15:39:23 -0800 (PST) In-Reply-To: <236D5828CF7F5749AF1BCB7F87596DE1047F5963@SNV-EXVS09.ds.corp.yahoo.com> References: <236D5828CF7F5749AF1BCB7F87596DE1047F590E@SNV-EXVS09.ds.corp.yahoo.com> <357a70950911301218k171c014ave8e9e0238012a44a@mail.gmail.com> <236D5828CF7F5749AF1BCB7F87596DE1047F5963@SNV-EXVS09.ds.corp.yahoo.com> Date: Mon, 30 Nov 2009 18:39:23 -0500 Message-ID: <357a70950911301539u1258fee1gc5f288b58d69f437@mail.gmail.com> Subject: Re: Pig reading hive columnar rc tables From: Dmitriy Ryaboy To: pig-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6d99c4bc20b9004799f2631 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d99c4bc20b9004799f2631 Content-Type: text/plain; charset=ISO-8859-1 I retract the suggestion :). How would we do testing/building for it in piggybank? Not include it in the compile and test targets, and set up a separate compile-rcstore, test-rcstore targets? -D On Mon, Nov 30, 2009 at 6:31 PM, Olga Natkovich wrote: > +1 on what Alan is saying. I think it would be an overkill to have > another contrib. for this. > > Olga > > -----Original Message----- > From: Alan Gates [mailto:gates@yahoo-inc.com] > Sent: Monday, November 30, 2009 2:42 PM > To: pig-dev@hadoop.apache.org > Subject: Re: Pig reading hive columnar rc tables > > > On Nov 30, 2009, at 12:18 PM, Dmitriy Ryaboy wrote: > > > That's awesome, I've been itching to do that but never got around to > > it.. > > Garrit, do you have any benchmarks on read speeds? > > > > I don't know about putting this in piggybank, as it carries with it > > pretty > > significant dependencies, increasing the size of the jar and making it > > difficult for users to don't need it to build piggybank in the first > > place. > > We might want to consider some other contrib for it -- maybe a "misc" > > contrib that would have indivudual ant targets for these kinds of > > compatibility submissions? > > Does it have to increase the size of the piggybank jar? Instead of > including hive in our piggybank jar, which I agree would be bad, can > we just say that if you want to use this function you need to provide > the appropriate hive jar yourself? This way we could use ivy to pull > the jars and build piggybank. > > I'm not really wild about creating a new section of contrib just for > functions that have heavier weight requirements. > > Alan. > > > > > -D > > > > > > On Mon, Nov 30, 2009 at 3:09 PM, Olga Natkovich > inc.com> wrote: > > > >> Hi Garrit, > >> > >> It would be great if you could contribute the code. The process is > >> pretty simple: > >> > >> - Open a JIRA that describes what the loader does and that you would > >> like to contribute it to the Piggybank. > >> - Submit the patch that contains the loader. Make sure it has unit > >> tests > >> and javadoc. > >> > >> On this is done, one of the committers will review and commit the > >> patch. > >> > >> More details on how to contribute are in > >> http://wiki.apache.org/pig/PiggyBank. > >> > >> Olga > >> > >> -----Original Message----- > >> From: Gerrit van Vuuren [mailto:gvanvuuren@specificmedia.com] > >> Sent: Friday, November 27, 2009 2:42 AM > >> To: pig-dev@hadoop.apache.org > >> Subject: Pig reading hive columnar rc tables > >> > >> Hi, > >> > >> > >> > >> I've coded a LoadFunc implementation that can read from Hive > >> Columnar RC > >> tables, this is needed for a project that I'm working on because > >> all our > >> data is stored using the Hive thrift serialized Columnar RC format. I > >> have looked at the piggy bank but did not find any implementation > >> that > >> could do this. We've been running it on our cluster for the last week > >> and have worked out most bugs. > >> > >> > >> > >> There are still some improvements to be done but I would need like > >> setting the amount of mappers based on date partitioning. Its been > >> optimized so as to read only specific columns and can churn through a > >> data set almost 8 times faster with this improvement because not all > >> column data is read. > >> > >> > >> > >> I would like to contribute the class to the piggybank can you guide > >> me > >> in what I need to do? > >> > >> I've used hive specific classes to implement this, is it possible > >> to add > >> this to the piggy bank build ivy for automatic download of the > >> dependencies? > >> > >> > >> > >> Thanks, > >> > >> Gerrit Jansen van Vuuren > >> > >> > > --0016e6d99c4bc20b9004799f2631--