pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Coveney <jcove...@gmail.com>
Subject Re: Persisting Pig Scripts
Date Mon, 11 Jun 2012 22:17:42 GMT
We could also just serialize the script to more than one value and paste it
together.

2012/6/11 Bill Graham <billgraham@gmail.com>

> That's expected. It's a cap on the size of how much of the script can be
> stored. I'm not sure what the exact size limit is though, but if it's
> causing issues I'm sure we could make it a configurable value.
>
>
> On Mon, Jun 11, 2012 at 2:33 PM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > Bill,
> >
> > Would you know if that is expected or a bug?
> >
> >
> >
> >
> > On Wed, Jun 6, 2012 at 5:56 PM, Bill Graham <billgraham@gmail.com>
> wrote:
> >
> >> One thing to be aware of when accessing the pig.script option is that
> >> AFAIK
> >> there's a limit to how large the script can be, after which the rest
> would
> >> be truncated.
> >>
> >>
> >> On Wed, Jun 6, 2012 at 5:44 PM, Prashant Kommireddi <
> prash1784@gmail.com
> >> >wrote:
> >>
> >> > I completely agree that's an option. But IMHO being able to do that
> >> upfront
> >> > would be a nice feature, adding cron is just an additional process we
> >> could
> >> > avoid if possible.
> >> >
> >> > On Wed, Jun 6, 2012 at 5:39 PM, Dmitriy Ryaboy <dvryaboy@gmail.com>
> >> wrote:
> >> >
> >> > > You can write a nightly cron that runs the JobHistoryLoader job and
> >> > > stores parsed scripts to hdfs...
> >> > >
> >> > > D
> >> > >
> >> > > On Wed, Jun 6, 2012 at 5:16 PM, Prashant Kommireddi <
> >> prash1784@gmail.com
> >> > >
> >> > > wrote:
> >> > > > I think that would be more of a post-process vs having Pig write
> the
> >> > same
> >> > > > to a HDFS location. That would avoid having to parse it from
> >> job.xml.
> >> > > >
> >> > > > On Wed, Jun 6, 2012 at 4:19 PM, Daniel Dai <daijy@hortonworks.com
> >
> >> > > wrote:
> >> > > >
> >> > > >> One existing solution is "pig.script" entry inside job.xml,
it is
> >> the
> >> > > >> serialized Pig script. JobHistoryLoader can load job.xml
files
> and
> >> > grab
> >> > > >> those entries. Does that solve your problem?
> >> > > >>
> >> > > >> Daniel
> >> > > >>
> >> > > >> On Wed, Jun 6, 2012 at 3:52 PM, Prashant Kommireddi <
> >> > > prash1784@gmail.com
> >> > > >> >wrote:
> >> > > >>
> >> > > >> > Hi All,
> >> > > >> >
> >> > > >> > What do you guys think about adding a feature to be
able to
> >> persist
> >> > > the
> >> > > >> > script (file or cache in case of grunt) on HDFS or locally
> based
> >> on
> >> > an
> >> > > >> > admin setting (pig.properties). This will help
> infrastructure/ops
> >> > > teams
> >> > > >> > analyze nature of Pig scripts and be able to make certain
> >> decisions
> >> > > based
> >> > > >> > on it (optimizing data storage based on access patterns
etc).
> >> This
> >> > is
> >> > > >> > actually something we want to do but the challenge is
there is
> no
> >> > > central
> >> > > >> > place where we can track user scripts.
> >> > > >> >
> >> > > >> > It could be a config param "pig.persist.script=/pig/".
The
> script
> >> > > could
> >> > > >> be
> >> > > >> > stored with a configurable name -> ${mapred.job.name}+${
> >> user.name
> >> > > >> > }+timestamp"
> >> > > >> > either on HDFS or local based on the configuration setting.
> >> > > >> >
> >> > > >> > Thanks,
> >> > > >> > Prashant
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> *Note that I'm no longer using my Yahoo! email address. Please email me
> at
> >> billgraham@gmail.com going forward.*
> >>
> >
> >
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> billgraham@gmail.com going forward.*
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message