carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravindra Pesala (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (CARBONDATA-466) Implement bucketing table in carbondata
Date Mon, 16 Jan 2017 17:05:26 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravindra Pesala reassigned CARBONDATA-466:
------------------------------------------

    Assignee: Ravindra Pesala

> Implement bucketing table in carbondata
> ---------------------------------------
>
>                 Key: CARBONDATA-466
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-466
>             Project: CarbonData
>          Issue Type: New Feature
>            Reporter: Ravindra Pesala
>            Assignee: Ravindra Pesala
>
> Bucketing is the useful feature when user wants to join big tables. And also it is useful
in driver level partition pruning to improve query performance.
> User can add buckets on any dimension column (except complex types) as follows
> {code}
> CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING)
> CLUSTERED BY(user_id) INTO 32 BUCKETS
> STORED BY 'carbondata';
> {code}
> In the above example column user_id is hash partitioned and creates 32 bucket files in
carbondata. So while doing the join with other table on bucketed column it can select same
buckets and do the join with out shuffling.
> Carbon format changes
> 1. Bucketing information needs to be stored inside schema thrift file
> 2. Bucket id can be stored inside every carbondata index file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message