hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-597) Pig does not handdle correctly the case where "*" is passed to UDF
Date Tue, 13 Jan 2009 11:18:04 GMT

    [ https://issues.apache.org/jira/browse/PIG-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663306#action_12663306

Shravan Matthur Narayanamurthy commented on PIG-597:

The exception is being thrown from ARITY where it is trying to convert the first field of
the tuple into a tuple. However, since we have a star, the tuple is not wrapped inside another
tuple and hence the exception.

This was done in order to model the trunk behavior which is that there is an implicit flatten
in front of a *. If we want to retain this behavior, then we need to change ARITY & other
functions which were written with the assumption that POUserFunc will wrap anything inside
a tuple though most of these functions will be useless when we have a UDF which outputs a
tuple. To give an example, say we have a function which returns a tuple and we want to find
its arity, ARITY(TupleRetUDF(*)) will always return one since POUserFunc will wrap the output
of TupleRetUDF into another tuple and ARITY is changed to return just the size of the input
tuple and not the size of the first field.

However, if we comment this code, then we need to modify FindQuantiles to consider the fact
that everything will be wrapped inside a tuple & the behavior is not conditional upon
the use of a star. I think this is better and Olga seems to agree as per her previous comment.
Any other thoughts? Retain trunk behavior or change it?

> Pig does not handdle correctly the case where "*" is passed to UDF
> ------------------------------------------------------------------
>                 Key: PIG-597
>                 URL: https://issues.apache.org/jira/browse/PIG-597
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Shravan Matthur Narayanamurthy
> Script:
> ======
> A = LOAD 'foo' USING PigStorage('\t');
> B = FILTER A BY ARITY(*) < 5;
> Error:
> =====
> 2009-01-05 21:46:56,355 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc
> - Caught error from UDF
> org.apache.pig.builtin.ARITY[org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.Tuple
[org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.Tuple]
> Problem:
> =======
> Santhosh tracked this to the following code in POUserFunc.java:
> if(op instanceof POProject &&
>                         op.getResultType() == DataType.TUPLE){
>                     POProject projOp = (POProject)op;
>                     if(projOp.isStar()){
>                         Tuple trslt = (Tuple) temp.result;
>                         Tuple rslt = (Tuple) res.result;
>                         for(int i=0;i<trslt.size();i++)
>                             rslt.append(trslt.get(i));
>                         continue;
>                     }
>                 }
> It seems to be unwrapping the tuple before passing it to the function. There is no comments
so we are not sure why it is there; will need to run tests to see if removing it would solve
this issue and not create others.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message