hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santhosh Srinivasan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-158) Rework logical plan
Date Tue, 15 Apr 2008 14:13:05 GMT

    [ https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589094#action_12589094
] 

Santhosh Srinivasan commented on PIG-158:
-----------------------------------------

Pi,

Thanks for the comments. Please see my responses inline with [Santhosh]

1) In COGroup why is that mInputs an ArrayList<String> ? Shouldn't it be ArrayList<LogicalOperator>
? How do you plan to get inputs out of strings?

[Santhosh] Yes, it should be ArrayList<LogicalOperator>. I realized this when I was
changing the parser code. I have made these changes but not posted a patch as the parser code
changes are being tested.

2) Why LOSort has getInput() but LOFilter and LOSplit don't have? All of them have 1 bag input
+ expression input(s).

[Santhosh] I have added getInput() to LOFilter as part of the parser changes (see previous
response). Looks like I have missed out on LOSplit. I will verify that and add it.

3) I think the PigTypeDesign documentation in Wiki is out-of-date. Is LOProject a replacement
for FieldExpression?

[Santhosh] LOProject is for operations like A.($0,$1) A.name, etc. I am not sure about the
name FieldExpression. It could be that.

4) What is the right way to get a column name or a column index from LOProject (if a column
name is known or a column index is known) ? At the moment LOProject maintains "List<String>
projection" which seems to contain column names. If I refer to columns by $0,$1,$2, ... ,
what will be stored in this string list?

[Santhosh] I have changed LOProject to take a list of integers instead of a list of string.
The columns should be referred to by position.

5) How to handle algebraic functions (takes bag, outputs dataatom) in the new type design.
I haven't seen such operators yet.

[Santhosh] I haven't looked into that. Let me get back to you.

6) Should all the relational operators share the same RelationalOperator parent class? All
of them share the same characteristic that is taking a bag of tuples as input and outputing
a bag of tuples)

[Santhosh] Thats a good question. Currently, all the relational operators are logical operators.
With your proposal, there will be an equivalent of expression operators. I would like to hear
what other folks think about this.

7) All the relational operators should always have getType() = DataType.BAG ?

[Santhosh] Thats true for most (all?) relational operators. I hope I have not missed out any.
Let me double check that statement.

8) What are setSchema(), getSchema() in relational operators? Do they mean schema of tuples
in the output bag?

[Santhosh] Yes

9) How about setSchema(), getSchema() in expression operators?

[Santhosh] Most of the expression operators should return a null. There are exceptions - user
defined functions can return tuples that have a schema, arithmetic operators on tuples will
result in schemas, etc.

10) (I believe you know this) Do we plan to have a bag containing other datatypes other than
tuples?

[Santhosh] I don't think so.

> Rework logical plan
> -------------------
>
>                 Key: PIG-158
>                 URL: https://issues.apache.org/jira/browse/PIG-158
>             Project: Pig
>          Issue Type: Sub-task
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: logical_operators.patch, logical_operators_rev_1.patch, logical_operators_rev_2.patch,
logical_operators_rev_3.patch, visitorWalker.patch
>
>
> Rework the logical plan in line with http://wiki.apache.org/pig/PigExecutionModel

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message