tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jihoon Son (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TAJO-256) Support data cube (Umbrella)
Date Wed, 16 Oct 2013 12:41:41 GMT

    [ https://issues.apache.org/jira/browse/TAJO-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796731#comment-13796731
] 

Jihoon Son edited comment on TAJO-256 at 10/16/13 12:40 PM:
------------------------------------------------------------

Group by extension queries require significantly high overhead.
Thus, the query optimization, especially for the distributed plan is very important.

Statistics such as histogram are very useful for the query optimization. 
Unfortunately, the current Tajo doesn't store any statistics for raw tables.

In this case, the sample-based cost estimation is a good solution.
In the sample-base cost estimation, the aggregation query is executed for the sampled table
before executing the query for the original table.
Here, statistics of the sampled data are collected during the query execution.
After that, more optimized query planning for the original table is possible using the collected
statistics.

So, I added the sample-based cost estimation to this issue.


was (Author: jihoonson):
Group by extension queries require significantly high overhead.
Thus, the query optimization, especially the distributed plan is very important.

Statistics such as histogram are very useful for the query optimization. 
Unfortunately, the current Tajo doesn't store any statistics for raw tables.

In this case, the sample-based cost estimation is a good solution.
In the sample-base cost estimation, the aggregation query is executed for the sampled table
before executing the query for the original table.
Here, statistics of the sampled data are collected during the query execution.
After that, more optimized query planning for the original table is possible using the collected
statistics.

So, I added the sample-based cost estimation to this issue.

> Support data cube (Umbrella)
> ----------------------------
>
>                 Key: TAJO-256
>                 URL: https://issues.apache.org/jira/browse/TAJO-256
>             Project: Tajo
>          Issue Type: New Feature
>          Components: catalog, distributed query plan, parser
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>             Fix For: 0.3-incubating
>
>
> This issue includes follows sub issues
> * SQL support of group by extensions (GROUPING SETS, CUBE, ROLLUP)
> * Query execution of group by extensions
> * GROUPING() function
> * Data cube materialization process
> * Cube schema maintenance
> * Sample-based cost estimation



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message