hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-981) Merge join should restrict join key expressions to simple projects
Date Mon, 28 Sep 2009 21:59:16 GMT

    [ https://issues.apache.org/jira/browse/PIG-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760406#action_12760406

Ashutosh Chauhan commented on PIG-981:

Default Merge Join implementation can handle order preserving join expressions, that is, when
merge join itself builds the index and doesn't rely on underlying storage for index. When
Merge Join doesn't build index itself, this can't be guaranteed, but then we don't have to
limit all possible uses of merge-join because of this reason. Rather, we should check if Merge
Join is building indexes of its own, if it is then allow order preserving expression, if it
is not, only *then* restrict expressions to projections.

> Merge join should restrict join key expressions to simple projects
> ------------------------------------------------------------------
>                 Key: PIG-981
>                 URL: https://issues.apache.org/jira/browse/PIG-981
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
> Currently merge join allows join key expressions to be arbitrary expressions with the
assumption that the expressions keep the sort order. Since currently only ascending sort order
is supported, the code checks at run times for sort order and catches the case where sort
order is broken because the join key expression is not order preserving. However there is
a reason we should restrict the join keys to projection of columns only:
>  PIG-953 will enable pig to perform merge join  to work with loaders and store functions
which can internally index sorted data. These store functions can only create an index (and
hence lookup on the index) on raw data columns (and not expressions on the columns).
> Hopefully this does not downgrade the usability of merge join much since if the expressions
can always be applied post join on the join columns and since the expressions are order preserving
they do not affect the outcome of the join. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message