hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-845) PERFORMANCE: Merge Join
Date Thu, 13 Aug 2009 22:29:14 GMT

     [ https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ashutosh Chauhan updated PIG-845:

    Attachment: merge-join.patch

if(rightMROpr == null || rightMROpr.equals(curMROp))
 throw new MRCompilerException("Successor of right input not ...

Do you also need to check rightMROpr == null here?
>> I removed null check because that indicates that two preceding MROperator exists
but one of them is null. This is highly unlikely and MRCompiler probably would have thrown
exception while compiling those preceding physical operator. But I added the check back again
in any case.

If index is empty it could mean one of the following two things:
1) Data for right input only has null for join key(s)
2) right input is empty
Are there any other reasons why the index would be empty?
In both these cases, join output would be empty - currently the code throws an exception
Should this change?
A unit test where right side input is empty would be a good one to add.
>> Exception thrown at that point is correct because if after reading index you get
null object, its a bug. But there was problem dealing with empty right file nonetheless. I
fixed that and added a test case for it as well.

Additionally, fixed findbugs warning.
Release audit warning is because of gold file addition for testing. Apache header cant be
added in it. So, it can be ignored.

> -----------------------
>                 Key: PIG-845
>                 URL: https://issues.apache.org/jira/browse/PIG-845
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Ashutosh Chauhan
>         Attachments: merge-join.patch, merge-join.patch
> Thsi join would work if the data for both tables is sorted on the join key.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message