pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Re: Optimization
Date Sun, 06 Feb 2011 21:02:33 GMT
Robert,
It is not clear from your code snippets what the relationships are
between the various "var" relations. Could you provide more detail?

It sort of sounds like you are asking about Pig's multiquery
optimization. You can read about it in these pages:
http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Multi-Query+Execution
http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification



On Sun, Feb 6, 2011 at 12:11 PM, Robert Waddell
<rwaddell88@googlemail.com> wrote:
> Hey Guys,
>
> I am trying to optimize my Pig jobs as much as possible and wanted to know a
> little about how Pig handles its loading of data.
>
> When I have:
>
> var1 = LOAD ....
> local_var1 = FOREACH
> local_var1 = JOIN ... [etc]
> ~~
> ~~
> ~~
> STORE local_var1 ...
> local_var2 = FOREACH local_var2
> local_var2 = JOIN ... [etc]
> ~~
> STORE local_var2
>
> am I gaining any performance improvements by not loading a lengthy file
> everytime, instead, storing it in a different alias (local_var2 &
> local_var1) and manipulating it there, preserving the original (var1), or am
> I better having multiple LOADs and manipulating the original alias directly
> ?
>
> Robert.
>

Mime
View raw message