Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id ECEB0200BFE for ; Mon, 16 Jan 2017 18:05:31 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E9B62160B4D; Mon, 16 Jan 2017 17:05:31 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3CF02160B28 for ; Mon, 16 Jan 2017 18:05:31 +0100 (CET) Received: (qmail 96769 invoked by uid 500); 16 Jan 2017 17:05:30 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 96760 invoked by uid 99); 16 Jan 2017 17:05:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jan 2017 17:05:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 074D1180BB7 for ; Mon, 16 Jan 2017 17:05:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id r1nRjIQPvWQM for ; Mon, 16 Jan 2017 17:05:29 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id A2AD15F297 for ; Mon, 16 Jan 2017 17:05:28 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id DCABCE1370 for ; Mon, 16 Jan 2017 17:05:27 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 675F525284 for ; Mon, 16 Jan 2017 17:05:26 +0000 (UTC) Date: Mon, 16 Jan 2017 17:05:26 +0000 (UTC) From: "Ravindra Pesala (JIRA)" To: issues@carbondata.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (CARBONDATA-466) Implement bucketing table in carbondata MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 16 Jan 2017 17:05:32 -0000 [ https://issues.apache.org/jira/browse/CARBONDATA-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala reassigned CARBONDATA-466: ------------------------------------------ Assignee: Ravindra Pesala > Implement bucketing table in carbondata > --------------------------------------- > > Key: CARBONDATA-466 > URL: https://issues.apache.org/jira/browse/CARBONDATA-466 > Project: CarbonData > Issue Type: New Feature > Reporter: Ravindra Pesala > Assignee: Ravindra Pesala > > Bucketing is the useful feature when user wants to join big tables. And also it is useful in driver level partition pruning to improve query performance. > User can add buckets on any dimension column (except complex types) as follows > {code} > CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING) > CLUSTERED BY(user_id) INTO 32 BUCKETS > STORED BY 'carbondata'; > {code} > In the above example column user_id is hash partitioned and creates 32 bucket files in carbondata. So while doing the join with other table on bucketed column it can select same buckets and do the join with out shuffling. > Carbon format changes > 1. Bucketing information needs to be stored inside schema thrift file > 2. Bucket id can be stored inside every carbondata index file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)