hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-113) Make Grunt's explain output more understandable
Date Fri, 22 Feb 2008 18:21:19 GMT

    [ https://issues.apache.org/jira/browse/PIG-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571507#action_12571507
] 

Alan Gates commented on PIG-113:
--------------------------------

In general the patch looks good.  Making the exception output more readable is something we
need.

There's one question I have that I'd like to get input from others on.  In the patch you've
made arguments to EXPLAIN be tokens in the language (XML, TREE).  That's a standard SQL approach.
 The pro is it is easy for users to type, and SQL users probably already think about things
that way.  The con is it bloats the number of token in the language (take a look at all the
tokens in the SQL standard compared to the number of tokens in a language like java) and it
means many changes include changes to the parser.

The other option is to make EXPLAIN take a string argument, so it would be EXPLAIN 'tree'
instead of EXPLAIN TREE.  This has the reverse pros and cons.  Another pro is java, etc. programmers
may think of this as a more natural model.

Thoughts?

> Make Grunt's explain output more understandable
> -----------------------------------------------
>
>                 Key: PIG-113
>                 URL: https://issues.apache.org/jira/browse/PIG-113
>             Project: Pig
>          Issue Type: Improvement
>          Components: grunt
>    Affects Versions: 0.1.0
>            Reporter: Pi Song
>            Priority: Minor
>         Attachments: pig_printtree_1.patch
>
>
> I think it would be better if we can display the execution plan in a more understandable
way. One intuitive way to do this is to show output as a tree like in SQL Server.
> Possibly we can  have 'AS <format>' as optional argument for explain command
> For example
> {noformat}
> Grunt> explain bag1 AS tree ;
> Grunt> explain bag1 AS xml ;
> {noformat}
> and 
> {noformat}
> Grunt> explain bag1   
> {noformat}
> will display the default format
> I have included a patch that does generate tree output.
> Here is a sample of the existing output format
> {noformat}
> Logical Plan:
> Group root-Sun Feb 17 19:37:07 GMT+10:00 2008-5
> Object id: 9814147
> Inputs: 26335425 
> Schema: (group, (sum, (), (), ()))
> EvalSpecs:
>         Generate: has 2 children
>                 Project: (0)
>                 Star
> Split root-Sun Feb 17 19:37:07 GMT+10:00 2008-2
> Object id: 25199001
> Inputs: 29132923 
> Schema: (sum, (), (), ())
> EvalSpecs:
> Eval root-Sun Feb 17 19:37:07 GMT+10:00 2008-1
> Object id: 29132923
> Inputs: 10774273 
> Schema: (sum, (), (), ())
> EvalSpecs:
>         Generate: has 4 children
>                 FuncEval: name: org.apache.pig.impl.builtin.ADD args:
>                         Generate: has 2 children
>                                 Project: (0)
>                                 Project: (1)
>                 Project: (0)
>                 Project: (1)
>                 Project: (2)
> Load root-Sun Feb 17 19:37:07 GMT+10:00 2008-0
> Object id: 10774273
> Inputs: 
> Schema: ()
> EvalSpecs:
> -----------------------------------------------
> Physical Plan:
> MAPREDUCE
> Object id: 17671659
> Inputs: 682933706
> Map: 
>         Star
> Grouping Funcs: 
>         Generate: has 2 children
>                 Project: (0)
>                 Star
> Input Files: /tmp/temp678140026/tmp1867058340
> MAPREDUCE
> Object id: 17308974
> Inputs: 
> Map: 
>         Composite: has 2 children
>                 Star
>                 Generate: has 4 children
>                         FuncEval: name: org.apache.pig.impl.builtin.ADD args:
>                                 Generate: has 2 children
>                                         Project: (0)
>                                         Project: (1)
>                         Project: (0)
>                         Project: (1)
>                         Project: (2)
> Input Files: /tmp/data1.txt
> Output File: /tmp/temp678140026/tmp1613817084
> {noformat}
> Here is a sample of my tree output which is more compact and more understandable :-
> {noformat}
> grunt> explain c1 as tree ;
> Logical Plan:
> |---LOCogroup ( GENERATE {[PROJECT $0],[*]} ) 
>       |---LOSplitOutput (  ) 
>             |---LOSplit ( ([PROJECT $0] < ['5']),([PROJECT $0] >= ['5']) ) 
>                   |---LOEval ( GENERATE {[org.apache.pig.impl.builtin.ADD(GENERATE {[PROJECT
$0],[PROJECT $1]})],[PROJECT $0],[PROJECT $1],[PROJECT $2]} ) 
>                         |---LOLoad ( file = /tmp/data1.txt )
> -----------------------------------------------
> Physical Plan:
> |---POMapreduce
>     Map : *
>     Grouping : Generate(Project(0),*)
>     Input File(s) : /tmp/temp678140026/tmp1867058340
>       |---POMapreduce
>           Map : Composite(*,Generate(FuncEval(org.apache.pig.impl.builtin.ADD(Generate(Project(0),Project(1)))),Project(0),Project(1),Project(2)))
>           Input File(s) : /tmp/data1.txt
> {noformat}
> I'm also thinking about doing output as xml as it might benefit people who are working
on displaying execution plan on GUI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message