tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TAJO-184) Refactor GlobalPlanner and global plan data structure
Date Mon, 16 Sep 2013 11:35:51 GMT

     [ https://issues.apache.org/jira/browse/TAJO-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hyunsik Choi resolved TAJO-184.
-------------------------------

    Resolution: Fixed

I've committed this patch. Thank you for the review!
                
> Refactor GlobalPlanner and global plan data structure
> -----------------------------------------------------
>
>                 Key: TAJO-184
>                 URL: https://issues.apache.org/jira/browse/TAJO-184
>             Project: Tajo
>          Issue Type: Improvement
>          Components: master, physical operator, planner/optimizer
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.2-incubating
>
>         Attachments: TAJO-184_2.patch, TAJO-184.patch
>
>
> Above all, I'm sorry for submitting a big patch. This patch modifies and refactors broadly
global planning, logical planning, and physical planning parts. It was hard to separate this
issue into smaller issues.
> Especially, this patch primarily rewrites GlobalPlanner and MasterPlan (global plan data
structure) as follows:
>  * Removed GlobalPlanOptimizer
>  * Added DirectedGraph interface, SimpleDirectedGraph concret class, and a visitor class
to visit a graph in post-order traverse way.
>  * Improved MasterPlan by using new graph API
>   ** query block graphs and an execution block graph are represented by SimpleDirectedGraph.
>   ** Now, we can traverse above graphs easily by using graph APIs.
>   ** Added DataChannel class to represent a data flow between execution blocks.
>  * MasterPlan.toString() prints a text graph to represent relationships among execution
blocks and a distributed plan.
>  * Add more sophisticated explain feature for a distributed plan and logical plan. It
is very useful for plan debugging.
>  * Now, the limit operator is pushed down to child execution block.
>   ** So, the intermediate data volume of a sort query with limit is reduced significantly.
>  * TableSubQuery (inline view) is supported. It follows SQL standards. So, you can do
a query as follows:
> {code}
> SELECT *
> FROM
> (
>     SELECT
>         l_orderkey,
>         l_partkey,
>         url
>     FROM
>         (
>           SELECT
>             l_orderkey,
>             l_partkey,
>             CASE
>               WHEN
>                 l_partkey IS NOT NULL THEN ''
>               WHEN l_orderkey = 1 THEN '1'
>             ELSE
>               '2'
>             END AS url
>           FROM
>             lineitem
>         ) res1
>         JOIN
>         (
>           SELECT
>             *
>           FROM
>             part
>         ) res2
>         ON l_partkey = p_partkey
> ) result
> {code}
> In addition, I've refactored as follows:
>  * Column has a qualifier name.
>  * Improved Schema to deal with qualified column names
>  * When a TableDesc instance is retrieved, it is forced to have qualifier columns.
>  * Fixed TAJO-162 bug.
>  * Lots of trivial improvement and refactors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message