carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravindra Pesala (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CARBONDATA-466) Implement bucketing table in carbondata
Date Tue, 29 Nov 2016 15:32:58 GMT
Ravindra Pesala created CARBONDATA-466:
------------------------------------------

             Summary: Implement bucketing table in carbondata
                 Key: CARBONDATA-466
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-466
             Project: CarbonData
          Issue Type: New Feature
            Reporter: Ravindra Pesala


Bucketing is the useful feature when user wants to join big tables. And also it is useful
in driver level partition pruning to improve query performance.
User can add buckets on any dimension column (except complex types) as follows
{code}
CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING)
CLUSTERED BY(user_id) INTO 32 BUCKETS
STORED BY 'carbondata';
{code}
In the above example column user_id is hash partitioned and creates 32 bucket files in carbondata.
So while doing the join with other table on bucketed column it can select same buckets and
do the join with out shuffling.

Carbon format changes
1. Bucketing information needs to be stored inside schema thrift file
2. Bucket id can be stored inside every carbondata index file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message