carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jack...@apache.org
Subject [24/54] [abbrv] carbondata git commit: [CARBONDATA-1442] Refactored Partition-Guide.md
Date Thu, 14 Sep 2017 09:20:17 GMT
[CARBONDATA-1442] Refactored Partition-Guide.md

This closes #1310


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/cd2332e5
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/cd2332e5
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/cd2332e5

Branch: refs/heads/streaming_ingest
Commit: cd2332e5493dfc78683af9c9fb0cfccbe34703ae
Parents: dc7d505
Author: PallaviSingh1992 <pallavisingh_1992@yahoo.co.in>
Authored: Thu Sep 7 10:32:10 2017 +0530
Committer: Jacky Li <jacky.likun@qq.com>
Committed: Fri Sep 8 22:24:32 2017 +0800

----------------------------------------------------------------------
 docs/partition-guide.md | 115 ++++++++++++++++++++++++++-----------------
 1 file changed, 71 insertions(+), 44 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/cd2332e5/docs/partition-guide.md
----------------------------------------------------------------------
diff --git a/docs/partition-guide.md b/docs/partition-guide.md
index 2a0df76..b0b7862 100644
--- a/docs/partition-guide.md
+++ b/docs/partition-guide.md
@@ -17,32 +17,34 @@
     under the License.
 -->
 
-### CarbonData Partition Table Guidance
-This guidance illustrates how to create & use partition table in CarbonData.
+# CarbonData Partition Table Guide
+This tutorial is designed to provide a quick introduction to create and use partition table
in Apache CarbonData.
 
 * [Create Partition Table](#create-partition-table)
   - [Create Hash Partition Table](#create-hash-partition-table)
   - [Create Range Partition Table](#create-range-partition-table)
   - [Create List Partition Table](#create-list-partition-table)
 * [Show Partitions](#show-partitions)
-* [Maintain the Partitions](#maintain-the-partitions)
+* [Maintaining the Partitions](#maintaining-the-partitions)
 * [Partition Id](#partition-id)
-* [Tips](#tips)
+* [Useful Tips](#useful-tips)
 
-### Create Partition Table
+## Create Partition Table
+
+### Create Hash Partition Table
 
-##### Create Hash Partition Table
 ```
    CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
                     [(col_name data_type , ...)]
    PARTITIONED BY (partition_col_name data_type)
    STORED BY 'carbondata'
-   [TBLPROPERTIES ('PARTITION_TYPE'='HASH', 
-                   'PARTITION_NUM'='N' ...)]  
+   [TBLPROPERTIES ('PARTITION_TYPE'='HASH',
+                   'PARTITION_NUM'='N' ...)]
    //N is the number of hash partitions
 ```
 
 Example:
+
 ```
    create table if not exists hash_partition_table(
       col_A String,
@@ -55,20 +57,25 @@ Example:
    tblproperties('partition_type'='Hash','partition_num'='9')
 ```
 
-##### Create Range Partition Table
+### Create Range Partition Table
+
 ```
    CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
                     [(col_name data_type , ...)]
    PARTITIONED BY (partition_col_name data_type)
    STORED BY 'carbondata'
-   [TBLPROPERTIES ('PARTITION_TYPE'='RANGE', 
+   [TBLPROPERTIES ('PARTITION_TYPE'='RANGE',
                    'RANGE_INFO'='2014-01-01, 2015-01-01, 2016-01-01' ...)]
 ```
-Notes: 
-1. The 'RANGE_INFO' defined in table properties must be in ascending order.
-2. If the partition column is Date/Timestamp type, the format could be defined in CarbonProperties.
By default it's yyyy-MM-dd.
+
+**Note:**
+
+- The 'RANGE_INFO' must be defined in ascending order in the table properties.
+
+- The default format for partition column of Date/Timestamp type is yyyy-MM-dd. Alternate
formats for Date/Timestamp could be defined in CarbonProperties.
 
 Example:
+
 ```
    create table if not exists hash_partition_table(
       col_A String,
@@ -82,19 +89,21 @@ Example:
    'range_info'='2015-01-01, 2016-01-01, 2017-01-01, 2017-02-01')
 ```
 
-##### Create List Partition Table
+### Create List Partition Table
+
 ```
    CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
                     [(col_name data_type , ...)]
    PARTITIONED BY (partition_col_name data_type)
    STORED BY 'carbondata'
-   [TBLPROPERTIES ('PARTITION_TYPE'='LIST', 
+   [TBLPROPERTIES ('PARTITION_TYPE'='LIST',
                    'LIST_INFO'='A, B, C' ...)]
 ```
-Notes:
-1. List partition support list info in one level group. 
+**Note :**
+- List partition supports list info in one level group.
+
+Example:
 
-Example:
 ```
    create table if not exists hash_partition_table(
       col_B Int,
@@ -109,41 +118,53 @@ Example:
 ```
 
 
-### Show Partitions
-Execute following command to get the partition information
+## Show Partitions
+The following command is executed to get the partition information of the table
+
 ```
    SHOW PARTITIONS [db_name.]table_name
-
 ```
 
-### Maintain the Partitions
-##### Add a new partition
+## Maintaining the Partitions
+### Add a new partition
+
 ```
    ALTER TABLE [db_name].table_name ADD PARTITION('new_partition')
 ```
-##### Split a partition
+### Split a partition
+
 ```
-   ALTER TABLE [db_name].table_name SPLIT PARTITION(partition_id) INTO('new_partition1',
'new_partition2'...)
+   ALTER TABLE [db_name].table_name SPLIT PARTITION(partition_id)
+   INTO('new_partition1', 'new_partition2'...)
 ```
-##### Drop a partition
+
+### Drop a partition
+
 ```
    //Drop partition definition only and keep data
    ALTER TABLE [db_name].table_name DROP PARTITION(partition_id)
-   
+
    //Drop both partition definition and data
    ALTER TABLE [db_name].table_name DROP PARTITION(partition_id) WITH DATA
 ```
-Notes:
-1. For the 1st case(keep data), 
-   * if the table is a range partition table, data will be merged into the next partition,
and if the dropped partition is the last one, then data will be merged into default partition.
+
+**Note**:
+
+- In the first case where the data in the table is preserved there can be multiple scenarios
as described below :
+
+   * if the table is a range partition table, data will be merged into the next partition,
and if the dropped partition is the last partition, then data will be merged into the default
partition.
+
    * if the table is a list partition table, data will be merged into default partition.
-2. Drop default partition is not allowed, but you can use DELETE statement to delete data
in default partition.
-3. partition_id could be got from SHOW PARTITIONS command.
-4. Hash partition table is not supported for the ADD, SPLIT, DROP command.
 
-### Partition Id
-In Carbondata, we don't use folders to divide partitions(just like hive did), instead we
use partition id to replace the task id. 
-It could make use of the characteristic and meanwhile reduce some metadata. 
+- Dropping the default partition is not allowed, but DELETE statement can be used to delete
data in default partition.
+
+- The partition_id could be fetched using the [SHOW PARTITIONS](#show-partitions) command.
+
+- Hash partition table is not supported for ADD, SPLIT and DROP commands.
+
+## Partition Id
+In CarbonData like the hive, folders are not used to divide partitions instead partition
id is used to replace the task id. It could make use of the characteristic and meanwhile reduce
some metadata.
+
 ```
 SegmentDir/0_batchno0-0-1502703086921.carbonindex
            ^
@@ -151,11 +172,17 @@ SegmentDir/part-0-0_batchno0-0-1502703086921.carbondata
                   ^
 ```
 
-### Tips
-Here are some tips to improve query performance of carbon partition table:
-##### 1. Do some analysis before choose the proper partition column
-The distribution of data on some column could be very skew, building a skewed partition table
is meaningless, so do some basic statistic analysis to avoid creating partition table on an
extremely skewed column.
-##### 2. Exclude partition column from sort columns
-If you have many dimensions need to be sorted, then exclude partition column from sort columns,
that will put other dimensions in a better position of sorting.
-##### 3. Remember to add filter on partition column when writing SQLs
-When writing SQLs on partition table, try to use filters on partition column.
+## Useful Tips
+Here are some useful tips to improve query performance of carbonData partition table:
+
+**Prior analysis of proper partition column**
+
+The distribution of data based on some random column could be skewed, building a skewed partition
table is meaningless. Some basic statistical analysis before the creation of partition table
can avoid an extremely skewed column.
+
+**Exclude partition column from sort columns**
+
+If you have many dimensions, that need to be sorted then one must exclude column present
in the partition from sort columns, this will allow another dimension to do the efficient
sorting.
+
+**Remember to add filter on partition column when writing SQL**
+
+When writing SQL on a partition table, try to use filters on the partition column.


Mime
View raw message