tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jihoon Son (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-266) Extend ExecutionBlock and Task to support multiple outputs
Date Sun, 27 Oct 2013 11:33:30 GMT

    [ https://issues.apache.org/jira/browse/TAJO-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806313#comment-13806313

Jihoon Son commented on TAJO-266:

For this issue, I designed a new class called ExecutionPlan.
An ExecutionPlan is a DAG which consists of LogicalNodes and their connections. Each connection
represents a data flow between LogicalNodes.
Each ExecutionBlock contains an ExecutionPlan instead of a LogicalPlan.
When a master executes an ExecutionBlock, it sends an ExecutionPlan of the ExecutionBlock
to tasks.
After that, each task generates a PhysicalPlan from the given ExecutionPlan.
Here, I added two PhysicalNodes, called PhysicalRootExec and MultiOutExec, to support multiple
outputs while preserving the pipelined query execution structure.
PhysicalRootExec is just used to represent the root of the physical plan. 
MultiOutExec receives an integer n as an argument of the constructor.
When a next() is called, MultiOutExec returns the same tuple n times.

I attached figures to help you better understand. These figures show a comparison between
the current master plan and a master plan optimized by the YSmart algorithm (see TAJO-161).

While this structure looks little complicated, it can support various master plan optimization
such as TAJO-161.
So, based on this structure, I think that we can develop a new master plan optimizer and optimization
rules which can significantly improve the query processing performance.

Please give any advice.

> Extend ExecutionBlock and Task to support multiple outputs
> ----------------------------------------------------------
>                 Key: TAJO-266
>                 URL: https://issues.apache.org/jira/browse/TAJO-266
>             Project: Tajo
>          Issue Type: Task
>          Components: distributed query plan, worker
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
> In the current Tajo, every task has the only one output.
> However, supporting multiple outputs per task very useful for the distributed plan optimization.

This message was sent by Atlassian JIRA

View raw message