Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 86A56C119 for ; Mon, 11 Jun 2012 22:18:10 +0000 (UTC) Received: (qmail 81914 invoked by uid 500); 11 Jun 2012 22:18:10 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 81874 invoked by uid 500); 11 Jun 2012 22:18:10 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 81864 invoked by uid 99); 11 Jun 2012 22:18:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jun 2012 22:18:10 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jcoveney@gmail.com designates 209.85.214.177 as permitted sender) Received: from [209.85.214.177] (HELO mail-ob0-f177.google.com) (209.85.214.177) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jun 2012 22:18:03 +0000 Received: by obqv19 with SMTP id v19so10820990obq.22 for ; Mon, 11 Jun 2012 15:17:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=PW3FtiY/0dJ8c8xogPpYdc9nxXDsbDMWxESnmRr6u9E=; b=aoGFMa8RpOuTtdPxvwt2vt1UCxmPvu96etn6hwGpkUdk9BolE4ZPgOBcNxu3zd3z0D y8RNMHfOgxIZ8heIF2fqevKkXWJhzYJa3E6a+ThcUvFXjl/uwfN7iLtNdT7MO20j5IEe oh+tB1F4vRVVR0jmseUt1Hcu/jG/X3JCHf6Cgr6Ewb1wkT7VKfrchbZjahaieD2J0ls8 zsShfB4mi3QvuYMB9DBbhr0ufgNq9WAi7QuKch3mbaB7OFzs/jRGL+3cShPAEKa7kwSn YKHinY3ce4226EY//udiIqFW/qzKOMzD0oc6wmP7AZdLkbJk3pDfC4H6z4xwpMNkjyd9 cPXg== MIME-Version: 1.0 Received: by 10.182.162.38 with SMTP id xx6mr18135811obb.50.1339453062488; Mon, 11 Jun 2012 15:17:42 -0700 (PDT) Received: by 10.182.32.3 with HTTP; Mon, 11 Jun 2012 15:17:42 -0700 (PDT) In-Reply-To: References: Date: Mon, 11 Jun 2012 15:17:42 -0700 Message-ID: Subject: Re: Persisting Pig Scripts From: Jonathan Coveney To: dev@pig.apache.org, billgraham@gmail.com Cc: Prashant Kommireddi Content-Type: multipart/alternative; boundary=e89a8f839f4ffba0fb04c239b7bf --e89a8f839f4ffba0fb04c239b7bf Content-Type: text/plain; charset=UTF-8 We could also just serialize the script to more than one value and paste it together. 2012/6/11 Bill Graham > That's expected. It's a cap on the size of how much of the script can be > stored. I'm not sure what the exact size limit is though, but if it's > causing issues I'm sure we could make it a configurable value. > > > On Mon, Jun 11, 2012 at 2:33 PM, Prashant Kommireddi >wrote: > > > Bill, > > > > Would you know if that is expected or a bug? > > > > > > > > > > On Wed, Jun 6, 2012 at 5:56 PM, Bill Graham > wrote: > > > >> One thing to be aware of when accessing the pig.script option is that > >> AFAIK > >> there's a limit to how large the script can be, after which the rest > would > >> be truncated. > >> > >> > >> On Wed, Jun 6, 2012 at 5:44 PM, Prashant Kommireddi < > prash1784@gmail.com > >> >wrote: > >> > >> > I completely agree that's an option. But IMHO being able to do that > >> upfront > >> > would be a nice feature, adding cron is just an additional process we > >> could > >> > avoid if possible. > >> > > >> > On Wed, Jun 6, 2012 at 5:39 PM, Dmitriy Ryaboy > >> wrote: > >> > > >> > > You can write a nightly cron that runs the JobHistoryLoader job and > >> > > stores parsed scripts to hdfs... > >> > > > >> > > D > >> > > > >> > > On Wed, Jun 6, 2012 at 5:16 PM, Prashant Kommireddi < > >> prash1784@gmail.com > >> > > > >> > > wrote: > >> > > > I think that would be more of a post-process vs having Pig write > the > >> > same > >> > > > to a HDFS location. That would avoid having to parse it from > >> job.xml. > >> > > > > >> > > > On Wed, Jun 6, 2012 at 4:19 PM, Daniel Dai > > >> > > wrote: > >> > > > > >> > > >> One existing solution is "pig.script" entry inside job.xml, it is > >> the > >> > > >> serialized Pig script. JobHistoryLoader can load job.xml files > and > >> > grab > >> > > >> those entries. Does that solve your problem? > >> > > >> > >> > > >> Daniel > >> > > >> > >> > > >> On Wed, Jun 6, 2012 at 3:52 PM, Prashant Kommireddi < > >> > > prash1784@gmail.com > >> > > >> >wrote: > >> > > >> > >> > > >> > Hi All, > >> > > >> > > >> > > >> > What do you guys think about adding a feature to be able to > >> persist > >> > > the > >> > > >> > script (file or cache in case of grunt) on HDFS or locally > based > >> on > >> > an > >> > > >> > admin setting (pig.properties). This will help > infrastructure/ops > >> > > teams > >> > > >> > analyze nature of Pig scripts and be able to make certain > >> decisions > >> > > based > >> > > >> > on it (optimizing data storage based on access patterns etc). > >> This > >> > is > >> > > >> > actually something we want to do but the challenge is there is > no > >> > > central > >> > > >> > place where we can track user scripts. > >> > > >> > > >> > > >> > It could be a config param "pig.persist.script=/pig/". The > script > >> > > could > >> > > >> be > >> > > >> > stored with a configurable name -> ${mapred.job.name}+${ > >> user.name > >> > > >> > }+timestamp" > >> > > >> > either on HDFS or local based on the configuration setting. > >> > > >> > > >> > > >> > Thanks, > >> > > >> > Prashant > >> > > >> > > >> > > >> > >> > > > >> > > >> > >> > >> > >> -- > >> *Note that I'm no longer using my Yahoo! email address. Please email me > at > >> billgraham@gmail.com going forward.* > >> > > > > > > > -- > *Note that I'm no longer using my Yahoo! email address. Please email me at > billgraham@gmail.com going forward.* > --e89a8f839f4ffba0fb04c239b7bf--