Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 81A73200C79 for ; Fri, 19 May 2017 20:02:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 802BB160BD1; Fri, 19 May 2017 18:02:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C4FC3160BB0 for ; Fri, 19 May 2017 20:02:07 +0200 (CEST) Received: (qmail 79190 invoked by uid 500); 19 May 2017 18:02:07 -0000 Mailing-List: contact issues-help@carbondata.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.apache.org Delivered-To: mailing list issues@carbondata.apache.org Received: (qmail 79181 invoked by uid 99); 19 May 2017 18:02:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 May 2017 18:02:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 9ABE51A7B2B for ; Fri, 19 May 2017 18:02:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id BwdGVexzRpMm for ; Fri, 19 May 2017 18:02:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 3AD1D5FC84 for ; Fri, 19 May 2017 18:02:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 853C8E07E1 for ; Fri, 19 May 2017 18:02:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3A97C2193A for ; Fri, 19 May 2017 18:02:04 +0000 (UTC) Date: Fri, 19 May 2017 18:02:04 +0000 (UTC) From: "cen yuhai (JIRA)" To: issues@carbondata.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CARBONDATA-910) Implement Partition feature MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 19 May 2017 18:02:08 -0000 [ https://issues.apache.org/jira/browse/CARBONDATA-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017763#comment-16017763 ] cen yuhai commented on CARBONDATA-910: -------------------------------------- Can you add a new partition type like hive? > Implement Partition feature > --------------------------- > > Key: CARBONDATA-910 > URL: https://issues.apache.org/jira/browse/CARBONDATA-910 > Project: CarbonData > Issue Type: New Feature > Components: core, data-load, data-query > Reporter: Cao, Lionel > Assignee: Cao, Lionel > > Why need partition table > Partition table provide an option to divide table into some smaller pieces. > With partition table: > 1. Data could be better managed, organized and stored. > 2. We can avoid full table scan in some scenario and improve query performance. (partition column in filter, > multiple partition tables join in the same partition column etc.) > Partitioning design > Range Partitioning > range partitioning maps data to partitions according to the range of partition column values, operator '<' defines non-inclusive upper bound of current partition. > List Partitioning > list partitioning allows you map data to partitions with specific value list > Hash Partitioning > hash partitioning maps data to partitions with hash algorithm and put them to the given number of partitions > Composite Partitioning(2 levels at most for now) > Range-Range, Range-List, Range-Hash, List-Range, List-List, List-Hash, Hash-Range, Hash-List, Hash-Hash > DDL-Create > Create table sales( > itemid long, > logdate datetime, > customerid int > ... > ...) > [partition by range logdate(...)] > [subpartition by list area(...)] > Stored By 'carbondata' > [tblproperties(...)]; > range partition: > partition by range logdate(< '2016-01-01', < '2017-01-01', < '2017-02-01', < '2017-03-01', < '2099-01-01') > list partition: > partition by list area('Asia', 'Europe', 'North America', 'Africa', 'Oceania') > hash partition: > partition by hash(itemid, 9) > composite partition: > partition by range logdate(< '2016- -01', < '2017-01-01', < '2017-02-01', < '2017-03-01', < '2099-01-01') > subpartition by list area('Asia', 'Europe', 'North America', 'Africa', 'Oceania') > DDL-Rebuild, Add > Alter table sales rebuild partition by (range|list|hash)(...); > Alter table salse add partition (< '2018-01-01'); #only support range partitioning, list partitioning > Alter table salse add partition ('South America'); > #Note: No delete operation for partition, please use rebuild. > If need delete data, use delete statement, but the definition of partition will not be deleted. > Partition Table Data Store > [Option One] > Use the current design, keep partition folder out of segments > Fact > |___Part0 > | |___Segment_0 > | |___ *******-[bucketId]-.carbondata > | |___ *******-[bucketId]-.carbondata > | |___Segment_1 > | ... > |___Part1 > | |___Segment_0 > | |___Segment_1 > |... > [Option Two] > remove partition folder, add partition id into file name and build btree in driver side. > Fact > |___Segment_0 > | |___ *******-[bucketId]-[partitionId].carbondata > | |___ *******-[bucketId]-[partitionId].carbondata > |___Segment_1 > |___Segment_2 > ... > Pros & Cons: > Option one would be faster to locate target files > Option two need to store more metadata of folders > Partition Table MetaData Store > partitioni info should be stored in file footer/index file and load into memory before user query. > Relationship with Bucket > Bucket should be lower level of partition. > Partition Table Query > Example: > Select * from sales > where logdate <= date '2016-12-01'; > User should remember to add a partition filter when write SQL on a partition table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)