pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aniket Mokashi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1434) Allow casting relations to scalars
Date Mon, 16 Aug 2010 19:20:18 GMT

    [ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899042#action_12899042
] 

Aniket Mokashi commented on PIG-1434:
-------------------------------------

Comments on the finalized syntax--

With the above changes Pig now supports -
{code}
Y = foreach X generate $1/(long) C.count, $2-(long) C.max;
{code}
1. Casts are *optional* and the datatype of scalar depends on the schema of C (ie depending
on the schema of C, we add the casts implicitly. So, typically, count is a long and max is
a double). In case of undeclared(null) schema for C, default type of scalar is *chararray*.

2. Projections are mandatory. For example
{code}
Y = foreach X generate C; // is an *error*
{code}
We need to use-
{code}
Y = foreach X generate C.$0; 
{code}

3. Check if C is a scalar or not is not performed until runtime, thus it will fail at the
time of execution of UDF with ExecException("Scalar has more than one row in the output").

> Allow casting relations to scalars
> ----------------------------------
>
>                 Key: PIG-1434
>                 URL: https://issues.apache.org/jira/browse/PIG-1434
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Aniket Mokashi
>             Fix For: 0.8.0
>
>         Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch,
ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch
>
>
> This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801.
> The proposal is to allow casting relations to scalar types in foreach.
> Example:
> A = load 'data' as (x, y, z);
> B = group A all;
> C = foreach B generate COUNT(A);
> .....
> X = ....
> Y = foreach X generate $1/(long) C;
> Couple of additional comments:
> (1) You can only cast relations including a single value or an error will be reported
> (2) Name resolution is needed since relation X might have field named C in which case
that field takes precedence.
> (3) Y will look for C closest to it.
> Implementation thoughts:
> The idea is to store C into a file and then convert it into scalar via a UDF. I believe
we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be
to update the logical plan to
> (1) Store C
> (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message