tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jihoon Son (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-283) Add Table Partitioning
Date Tue, 24 Dec 2013 06:43:52 GMT

    [ https://issues.apache.org/jira/browse/TAJO-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856189#comment-13856189
] 

Jihoon Son commented on TAJO-283:
---------------------------------

[~coderplay]
The most advantage of supporting the column partition is the compatibility with Hive.
Since many Hive users already use the column partition, more users can consider Tajo as the
replacement of Hive if the column partition is supported.
As you said, the column partition can incur a problem when there are a large number of partitions.
We should devise a solution to handle it.

In my opinion, supporting the lucene index looks great!
Since unstructured data are generally processed in the Hadoop world, I think that Tajo also
has a need to provide the processing of unstructured data as well as the relational data.
Since Lucene is optimized for processing documents, it would be useful in Tajo, too.
But, it should be transparent to Tajo users and its query should be presented in a SQL-like
form.

Thanks for your detailed response and great suggestion.

[~coderplay] [~hyunsik]
Happy Christmas!

Jihoon

> Add Table Partitioning
> ----------------------
>
>                 Key: TAJO-283
>                 URL: https://issues.apache.org/jira/browse/TAJO-283
>             Project: Tajo
>          Issue Type: New Feature
>          Components: catalog, physical operator, planner/optimizer
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.8-incubating
>
>
> Table partitioning gives many facilities to maintain large tables. First of all, it enables
the data management system to prune many input data which are actually not necessary. In addition,
it gives the system more optimization  opportunities  that exploit the physical layouts.
> Basically, Tajo should follow the RDBMS-style partitioning system, including range, list,
hash, and so on. In order to keep Hive compatibility, we need to add Hive partition type that
does not exists in existing DBMS systems.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message