tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-283) Add Table Partitioning
Date Wed, 18 Dec 2013 11:18:07 GMT

    [ https://issues.apache.org/jira/browse/TAJO-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851603#comment-13851603

Hyunsik Choi commented on TAJO-283:

The result in staging dir is finally moved to a specified output directory. Usually, the output
is moved to warehouse dir (e.g., /tajo/warehouse/xxxx).

In TAJO-329, Jaehwa implemented a table partition executor for column partitioned table. Interestingly,
TAJO-329 works correctly without no shuffle. However, this way will create too many output
files equivalent to the number of HDFS blocks. It is not fit for HDFS's characteristics. 

So, I'm going to modify a distributed planner to allow a partitioned table store operator
to have a proper shuffle method. For example, hash shuffle is good for column, list, and hash
partition types, and range shuffle is good for range partition. In some special case, table
partitions without shuffle may be useful after TAJO-385, which merges a number of fragments
into fewer fragments.


> Add Table Partitioning
> ----------------------
>                 Key: TAJO-283
>                 URL: https://issues.apache.org/jira/browse/TAJO-283
>             Project: Tajo
>          Issue Type: New Feature
>          Components: catalog, physical operator, planner/optimizer
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.8-incubating
> Table partitioning gives many facilities to maintain large tables. First of all, it enables
the data management system to prune many input data which are actually not necessary. In addition,
it gives the system more optimization  opportunities  that exploit the physical layouts.
> Basically, Tajo should follow the RDBMS-style partitioning system, including range, list,
hash, and so on. In order to keep Hive compatibility, we need to add Hive partition type that
does not exists in existing DBMS systems.

This message was sent by Atlassian JIRA

View raw message