tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TAJO-501) Rewrite thehe projection part of logical planning
Date Tue, 14 Jan 2014 09:56:59 GMT
Hyunsik Choi created TAJO-501:

             Summary: Rewrite thehe projection part of logical planning
                 Key: TAJO-501
                 URL: https://issues.apache.org/jira/browse/TAJO-501
             Project: Tajo
          Issue Type: Improvement
          Components: planner/optimizer
            Reporter: Hyunsik Choi
            Assignee: Hyunsik Choi
            Priority: Critical
             Fix For: 0.8-incubating

The projection part of LogicalPlanner was designed long time ago. It has evolved to support
many SQL expressions. However, due to its rough design, it is hard to be improved for further
SQL expressions and it causes many bugs.

The current logical planner has the following problems:
 * other expressions except for column can be used in group-by clause.
  * TAJO-422
 * other expressions except for column can not be used in order-by clause.
 * An expression including some aggregation function must be evaluated in group-by executor.
   * As a result, some aggregation operator like HashAggregateExec has to keep all intermediate
results of a complex expression in a hash table.
   * It also causes frequent GC and large memory consumption.

The too high code complexity also causes many bugs like
 * TAJO-434 - java.lang.NullPointerException for invalid column name
 * TAJO-428 - CASE WHEN IS NULL condition is a problem using LEFT OUTER JOIN
 * TAJO-463 - ProjectionPushDownRule incorrectly rewrite the output schema of StoreTableNode
 * TAJO-443 - Order by query gives NullPointerException at at org.apache.tajo.catalog.Schema.getColumnId(Schema.java:142)

The major reason of this problem is as follows:
 * TargetListManager keeps only the final target list.
   * SELECT col1, sum(col2) as col2, ... <- the final target list
 * TargetListManager deals with each expression described in a target list or  other clauses
like group-by clause as a singleton expression.

The main objective of this issue is to rewrite the projection part of logical planning in
order to those problems.

For 2 weeks, I've rewritten this part. I'll submit the patch soon.

This message was sent by Atlassian JIRA

View raw message