hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-158) Rework logical plan
Date Wed, 30 Apr 2008 14:43:57 GMT

    [ https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593345#action_12593345
] 

Pi Song commented on PIG-158:
-----------------------------

This patch is a monster!!!
Questions and comments for the first 30-40% of the parser:-

1) In parseCogroup,
{code}
+ int arity = gis.get(0).specList.size();
+
  for (int i = 0; i < n ; i++){
 
  CogroupInput gi = gis.get(i);
  los.add(gi.op);
- specs.add(gi.spec);
+ ArrayList<ExpressionOperator> groupByOps = gi.specList;
+ int numGrpByOps = groupByOps.size();
+ log.info("Number of group by operators = " + numGrpByOps);
+
+ if(arity != numGrpByOps) {
+ throw new ParseException("The arity of the group by columns do not match.");
+ }
{code}
so this seems all the cogroup inputs must have the same number of grouped columns, right?
I think this somewhat lowers the flexibility of the language. Not sure what Alan really wants.
Another question is if we group by more than one column, will the "group" column of the output
tuples be tuples?

2) In parseCogroup, I don't quite understand this bit. Initially specs is declared as:-
{code}
ArrayList<ArrayList<ExpressionOperator>> specs = new ArrayList<ArrayList<ExpressionOperator>>();
{code}
but this is how it is being used:-
{code}
for (LogicalOperator op: specs) {
+ lp.connect(op, cogroup);
{code}

3) In rewriteJoin, the usage of "column" is a bit weird, isn't it?
{code}
  for (int i = 0; i < n; i++) {
- EvalSpec column = new ProjectSpec(i+1);
+ ExpressionOperator column = new LOProject(lp, new OperatorKey(scope, getNextId()), gis.get(i).op,
-1);
+ ((LOProject)column).setStar(true);
{code}

4) In rewriteJoin, after the creation of cogroup in this line. I don't see it is being connected
to anything.
LogicalOperator cogroup = parseCogroup(gis, lp);

5) Here why return type is NULL and why do we use LOUserFunc?
{code}
<ANY> {gs = new LOGenerate(lp, new OperatorKey(scope, getNextId()), new LOUserFunc(lp,
new OperatorKey(scope, getNextId()), GFAny.class.getName(), null, DataType.NULL));}
{code}

> Rework logical plan
> -------------------
>
>                 Key: PIG-158
>                 URL: https://issues.apache.org/jira/browse/PIG-158
>             Project: Pig
>          Issue Type: Sub-task
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: logical_operators.patch, logical_operators_rev_1.patch, logical_operators_rev_2.patch,
logical_operators_rev_3.patch, parser_changes.patch, parser_changes_v1.patch, parser_changes_v2.patch,
ParserErrors.txt, visitorWalker.patch
>
>
> Rework the logical plan in line with http://wiki.apache.org/pig/PigExecutionModel

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message