pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Does the pig optimizer keep track of relations that are already sorted when doing a JOIN?
Date Sun, 21 Aug 2011 16:59:05 GMT
@Andrew,
You can take a look at the conditions for merge-join here:
http://pig.apache.org/docs/r0.8.1/piglatin_ref1.html#Merge+Joins

@Kevin,
If you want to improve merge-join, way to go is
https://issues.apache.org/jira/browse/PIG-959

Ashutosh

On Sun, Aug 21, 2011 at 04:27, Andrew Clegg
<andrew.clegg+mahout@gmail.com>wrote:

> I'd never thought about this before, but some of my scripts could
> probably be made much quicker by taking advantage of this. From what
> operations are relations guaranteed to be sorted? Distinct, group by,
> order by, previous merge join I guess? Any others?
>
> On 20 August 2011 07:12, Ashutosh Chauhan <hashutosh@apache.org> wrote:
> > Hey Kevin,
> >
> > No, Pig currently doesn't auto-detect that data is getting sorted in
> > previous steps of script. So, you need to tell it by 'using merge'.
> >
> > Hope it helps,
> > Ashutosh
> >
> > On Fri, Aug 19, 2011 at 22:51, Kevin Burton <burton@spinn3r.com> wrote:
> >
> >> I was reading about USING 'merge' with JOIN when relations are already
> >> sorted.
> >>
> >> I actually was just looking through some code and realized that one of
> my
> >> JOINs was on two relations that were *already* sorted due to a DISTINCT
> and
> >> GROUP operation.
> >>
> >> I just added USING 'merge' and the initial results look the same.
> >>
> >> I haven't benchmarked it though.
> >>
> >> Does/would the existing optimizer be able to detect this and just use
> merge
> >> without manual intervention?
> >>
> >> --
> >>
> >> Founder/CEO Spinn3r.com
> >>
> >> Location: *San Francisco, CA*
> >> Skype: *burtonator*
> >>
> >> Skype-in: *(415) 871-0687*
> >>
> >
>
>
>
> --
>
> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message