hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
Date Thu, 10 Sep 2009 21:32:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753827#action_12753827

Pradeep Kamath commented on PIG-953:

I see the first part as a coding style preference - I have done both styles in code myself
- don't think it is a major issue with readability with current implementation

Can you explain the NPE? If either object is null, the code would return with false unless
both are null. If checkEquality is false, the caller should know that only null equality has
been checked thus far and if true was returned then the two objects are null and hence equal.
My main intent was to have this helper function be used from other Class's equals() implementation
so that this mundane check for null need not be repeated in every equals implementation. Maybe
I am not understanding your use case better - an example might help.

> Enable merge join in pig to work with loaders and store functions which can internally
index sorted data 
> ---------------------------------------------------------------------------------------------------------
>                 Key: PIG-953
>                 URL: https://issues.apache.org/jira/browse/PIG-953
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: PIG-953.patch
> Currently merge join implementation in pig includes construction of an index on sorted
data and use of that index to seek into the "right input" to efficiently perform the join
operation. Some loaders (notably the zebra loader) internally implement an index on sorted
data and can perform this seek efficiently using their index. So the use of the index needs
to be abstracted in such a way that when the loader supports indexing, pig uses it (indirectly
through the loader) and does not construct an index. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message