hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
Date Mon, 14 Sep 2009 02:34:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754811#action_12754811

Ashutosh Chauhan commented on PIG-953:

And couple more:
bq. Findbugs complains about passing internal members as is in getters since the caller can
then modifiy these internal members - hence the copy.

    public List<Boolean> getAscColumns() {
        return Utils.getCopy(ascColumns);

Instead if we use following, we will achieve the same thing and then neither findbugs will
complain, nor their is need for our own copy method.
    public List<Boolean> getAscColumns() {
        return new ArrayList<Boolean>(ascColumns);

9. In POMergeJoin.java
        // we should never get here!
        return new Result(POStatus.STATUS_ERR, null);

could be changed to
        // we should never get here!
        throw new ExecException(errMsg,2176);
because if we ever get there, it will result in NPE later on otherwise.

> Enable merge join in pig to work with loaders and store functions which can internally
index sorted data 
> ---------------------------------------------------------------------------------------------------------
>                 Key: PIG-953
>                 URL: https://issues.apache.org/jira/browse/PIG-953
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: PIG-953.patch
> Currently merge join implementation in pig includes construction of an index on sorted
data and use of that index to seek into the "right input" to efficiently perform the join
operation. Some loaders (notably the zebra loader) internally implement an index on sorted
data and can perform this seek efficiently using their index. So the use of the index needs
to be abstracted in such a way that when the loader supports indexing, pig uses it (indirectly
through the loader) and does not construct an index. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message