pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-161) Rework physical plan
Date Tue, 01 Apr 2008 23:13:24 GMT

    [ https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584351#action_12584351

Alan Gates commented on PIG-161:

Comments on Shubham's pogenerate patch:


Before we can commit this, you'll need to add some unit tests for each of these operators
and fill out the auto-generated methods like name, supportsMultipleInputs, visit, etc.

Where possible, we should use native types rather than objects.  For example, on line 113
you declare restartIts to be a Boolean instead of boolean.  This requires an
unnecessary heap allocation and conversions between objects and native types on the part of
the JVM.

I don't understand the logic in getNext(Tuple).  I find in a function like this with a complicated
algorithm that it often helps to put an overview of the function and
what it does in a header comment, and then give detailed comments at each step of the algorithm.
 In particular I don't follow the code between lines
206-250, or why we need the RewindableBagIterator.

A couple of comments on what I think I understand in getNext(Tuple):

    1 I don't see logic to handle flattening of tuples.  So if if I have a script like:  b
= group a by ($0, $1); c = foreach b generate flatten(group), COUNT($1) then
    the resulting tuples coming out of the foreach should have 3 fields, since group was a
tuple of 2 fields.

    2 Lines 186-195, if I understand this correctly, you are sticking all non-tuple/non-bag
values in a bag with one tuple which has one field.  Why?  The old code did
    stuff like this but this is exactly what we want to get away from.  You should be able
to stick these results directly into the target tuple that you will eventually
    return, if there is no flattening to be done.  If there is flattening to be done the tuple
you stick the results into will serve as the prototype for other tuples
    you will generate.  But there's still no need to create that extra bag and tuple to put
the result in.
So I'm thinking the logic will look something like this:


if (previous cross product to return results from) return next cross product result;

Tuple target = new Tuple(size(inputs));
foreach (i: input) {
    target[i] = input[i].getNext();

if (no flattening) return target;
else {
    if (tuple flattening) remap target to expand flattened tuples;

    if (bag flattening) {
        remember cross product we're doing;
        return first tuple from cross product;


I think you're doing most of this, with the expection of the two points I made above.

What is createFile for?

Its very confusing that getRootOperators calls getLeaves on the contained plans.  I think
it's doing the right thing, but at first glance the names through me.

Arithmetic operators:

The code should not be swallowing error returns from its predecessors and passing nulls. 
It should instead pass errors along.  

What is the value of the ArithmeticOperator class?  How do arithmetic operators differ from
any other binary operator such that there should be a class to represent them?

We need to add a MOD operator for int and long too.

Doesn't this code make you wish you had a functional language so you didn't have to write
the same classes each time for each operator? :)

> Rework physical plan
> --------------------
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: Phy_AbsClass.patch, pogenerate.patch
> This bug tracks work to rework all of the physical operators as described in http://wiki.apache.org/pig/PigTypesFunctionalSpec

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message