Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5726D200BCC for ; Tue, 29 Nov 2016 16:33:03 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 55EC9160B15; Tue, 29 Nov 2016 15:33:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9F900160AFC for ; Tue, 29 Nov 2016 16:33:02 +0100 (CET) Received: (qmail 90985 invoked by uid 500); 29 Nov 2016 15:33:01 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 90976 invoked by uid 99); 29 Nov 2016 15:33:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2016 15:33:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6FF0AC6681 for ; Tue, 29 Nov 2016 15:33:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.019 X-Spam-Level: X-Spam-Status: No, score=-7.019 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id uiDnLNWC56Vn for ; Tue, 29 Nov 2016 15:33:00 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id D5AB05FC17 for ; Tue, 29 Nov 2016 15:32:59 +0000 (UTC) Received: (qmail 89727 invoked by uid 99); 29 Nov 2016 15:32:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2016 15:32:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D12402C03DC for ; Tue, 29 Nov 2016 15:32:58 +0000 (UTC) Date: Tue, 29 Nov 2016 15:32:58 +0000 (UTC) From: "Ravindra Pesala (JIRA)" To: issues@carbondata.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CARBONDATA-466) Implement bucketing table in carbondata MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 29 Nov 2016 15:33:03 -0000 Ravindra Pesala created CARBONDATA-466: ------------------------------------------ Summary: Implement bucketing table in carbondata Key: CARBONDATA-466 URL: https://issues.apache.org/jira/browse/CARBONDATA-466 Project: CarbonData Issue Type: New Feature Reporter: Ravindra Pesala Bucketing is the useful feature when user wants to join big tables. And also it is useful in driver level partition pruning to improve query performance. User can add buckets on any dimension column (except complex types) as follows {code} CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING) CLUSTERED BY(user_id) INTO 32 BUCKETS STORED BY 'carbondata'; {code} In the above example column user_id is hash partitioned and creates 32 bucket files in carbondata. So while doing the join with other table on bucketed column it can select same buckets and do the join with out shuffling. Carbon format changes 1. Bucketing information needs to be stored inside schema thrift file 2. Bucket id can be stored inside every carbondata index file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)