pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Johnson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4458) Support UDFs in a FOREACH Before a Merge Join
Date Thu, 12 Mar 2015 18:37:38 GMT

    [ https://issues.apache.org/jira/browse/PIG-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359141#comment-14359141
] 

Brian Johnson commented on PIG-4458:
------------------------------------

I think it should be removed altogether. A FOREACH GENERATE that changed the position of the
join key breaks the map side merge, but the validation doesn't reject it. Why strictly enforce
one and not the other? isAcceptableSortOp has an incomplete check as well that is permissive
instead of restrictive like the UDF check. I think it makes sense to 

// TODO: really, we should check that the sort is on the join keys, in the same order!

I think the main check makes sense and the isAcceptableForEachOp check makes sense, but isAcceptableSortOp
and containsUDFs are either pointless or going too far

> Support UDFs in a FOREACH Before a Merge Join
> ---------------------------------------------
>
>                 Key: PIG-4458
>                 URL: https://issues.apache.org/jira/browse/PIG-4458
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: William Watson
>
> Right now, the MapSideMergeValidator outright rejects any foreach that has a UDF in it:
> {code}
> private boolean isAcceptableForEachOp(Operator lo) throws LogicalToPhysicalTranslatorException
{
>         if (lo instanceof LOForEach) {
>             OperatorPlan innerPlan = ((LOForEach) lo).getInnerPlan();
>             validateMapSideMerge(innerPlan.getSinks(), innerPlan);
>             return !containsUDFs((LOForEach) lo);
>         } else {
>             return false;
>         }
>     }
> {code}
> There is a TODO for this later on in that same class (inside containsUDFs):
> {code}
> // TODO (dvryaboy): in the future we could relax this rule by tracing what fields
> // are being passed into the UDF, and only refusing if the UDF is working on the
> // join key. Transforms of other fields should be ok.
> {code}
> We should do the TODO and relax this requirement or just remove it altogether



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message