carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ravipes...@apache.org
Subject [37/49] carbondata git commit: [DOCS] Removed unused parameters, added SORT_SCOPE, and updated dictionary details
Date Mon, 13 Nov 2017 22:12:03 GMT
[DOCS] Removed unused parameters, added SORT_SCOPE, and updated dictionary details

This closes #1426


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/520e50f3
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/520e50f3
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/520e50f3

Branch: refs/heads/fgdatamap
Commit: 520e50f32f3716b1335df37efb26222d37bc2b20
Parents: 9f6c8e6
Author: sgururajshetty <sgururajshetty@gmail.com>
Authored: Sun Oct 22 15:38:01 2017 +0530
Committer: chenliang613 <chenliang613@huawei.com>
Committed: Sat Nov 11 16:12:09 2017 +0800

----------------------------------------------------------------------
 docs/configuration-parameters.md    |  6 +-----
 docs/ddl-operation-on-carbondata.md | 31 ++++++++++++++++++++++++++++---
 2 files changed, 29 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/520e50f3/docs/configuration-parameters.md
----------------------------------------------------------------------
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index e085317..141a60c 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -48,12 +48,8 @@ This section provides the details of all the configurations required for
CarbonD
 
 | Parameter | Default Value | Description | Range |
 |--------------------------------------|---------------|----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| carbon.sort.file.buffer.size | 20 | File read buffer size used during sorting. This value
is expressed in MB. | Min=1 and Max=100 |
-| carbon.graph.rowset.size | 100000 | Rowset size exchanged between data load graph steps.
| Min=500 and Max=1000000 |
 | carbon.number.of.cores.while.loading | 6 | Number of cores to be used while loading data.
|  |
 | carbon.sort.size | 500000 | Record count to sort and write intermediate files to temp.
|  |
-| carbon.enableXXHash | true | Algorithm for hashmap for hashkey calculation. |  |
-| carbon.number.of.cores.block.sort | 7 | Number of cores to use for block sort while loading
data. |  |
 | carbon.max.driver.lru.cache.size | -1 | Max LRU cache size upto which data will be loaded
at the driver side. This value is expressed in MB. Default value of -1 means there is no memory
limit for caching. Only integer values greater than 0 are accepted. |  |
 | carbon.max.executor.lru.cache.size | -1 | Max LRU cache size upto which data will be loaded
at the executor side. This value is expressed in MB. Default value of -1 means there is no
memory limit for caching. Only integer values greater than 0 are accepted. If this parameter
is not configured, then the carbon.max.driver.lru.cache.size value will be considered. | 
|
 | carbon.merge.sort.prefetch | true | Enable prefetch of data during merge sort while reading
data from sort temp files in data loading. |  |
@@ -135,7 +131,7 @@ This section provides the details of all the configurations required for
CarbonD
 |---------------------------------------|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | high.cardinality.identify.enable | true | If the parameter is true, the high cardinality
columns of the dictionary code are automatically recognized and these columns will not be
used as global dictionary encoding. If the parameter is false, all dictionary encoding columns
are used as dictionary encoding. The high cardinality column must meet the following requirements:
value of cardinality > configured value of high.cardinality. <b> Note: </b>
If SINGLE_PASS is used during data load, then this property will be disabled.|
 | high.cardinality.threshold | 1000000  | It is a threshold to identify high cardinality
of the columns.If the value of columns' cardinality > the configured value, then the columns
are excluded from dictionary encoding. |
-| carbon.cutOffTimestamp | 1970-01-01 05:30:00 | Sets the start date for calculating the
timestamp. Java counts the number of milliseconds from start of "1970-01-01 00:00:00". This
property is used to customize the start of position. For example "2000-01-01 00:00:00". The
date must be in the form "carbon.timestamp.format". NOTE: The CarbonData supports data store
up to 68 years from the cut-off time defined. For example, if the cut-off time is 1970-01-01
05:30:00, then the data can be stored up to 2038-01-01 05:30:00. |
+| carbon.cutOffTimestamp | 1970-01-01 05:30:00 | Sets the start date for calculating the
timestamp. Java counts the number of milliseconds from start of "1970-01-01 00:00:00". This
property is used to customize the start of position. For example "2000-01-01 00:00:00". The
date must be in the form "carbon.timestamp.format". |
 | carbon.timegranularity | SECOND | The property used to set the data granularity level DAY,
HOUR, MINUTE, or SECOND. |
   
 ##  Spark Configuration

http://git-wip-us.apache.org/repos/asf/carbondata/blob/520e50f3/docs/ddl-operation-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/ddl-operation-on-carbondata.md b/docs/ddl-operation-on-carbondata.md
index 55d7063..d1fee46 100644
--- a/docs/ddl-operation-on-carbondata.md
+++ b/docs/ddl-operation-on-carbondata.md
@@ -62,14 +62,14 @@ The following DDL operations are supported in CarbonData :
 
    - **Dictionary Encoding Configuration**
 
-       Dictionary encoding is enabled by default for all String columns, and disabled for
non-String columns. You can include and exclude columns for dictionary encoding.
+       Dictionary encoding is turned off for all columns by default. You can include and
exclude columns for dictionary encoding.
 
 ```
        TBLPROPERTIES ('DICTIONARY_EXCLUDE'='column1, column2')
        TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2')
 ```
 
-   Here, DICTIONARY_EXCLUDE will exclude dictionary creation. This is applicable for high-cardinality
columns and is an optional parameter. DICTIONARY_INCLUDE will generate dictionary for the
columns specified in the list.
+   Here, DICTIONARY_INCLUDE will improve the performance for low cardinality dimensions,
considerably for string. DICTIONARY_INCLUDE will generate dictionary for the columns specified.
 
 
 
@@ -129,7 +129,7 @@ The following DDL operations are supported in CarbonData :
 
    - **SORT_COLUMNS**
 
-    This table property specifies the order of the sort column.
+      This table property specifies the order of the sort column.
 
 ```
     TBLPROPERTIES('SORT_COLUMNS'='column1, column3')
@@ -140,6 +140,31 @@ The following DDL operations are supported in CarbonData :
    - If this property is not specified, then by default SORT_COLUMNS consist of all dimension
(exclude Complex Column).
 
    - If this property is specified but with empty argument, then the table will be loaded
without sort. For example, ('SORT_COLUMNS'='')
+   
+   - **SORT_SCOPE**
+      This option specifies the scope of the sort during data load. Following are the types
of sort scope.
+     * BATCH_SORT: it will increase the load performance but decreases the query performance
if identified blocks > parallelism.
+```
+    OPTIONS ('SORT_SCOPE'='BATCH_SORT')
+```
+      You can also specify the sort size option for sort scope.
+```
+    OPTIONS ('SORT_SCOPE'='BATCH_SORT', 'batch_sort_size_inmb'='7')
+```
+     * GLOBAL_SORT: it increases the query performance, especially point query.
+```
+    OPTIONS ('SORT_SCOPE'= GLOBAL_SORT ')
+```
+	 You can also specify the number of partitions to use when shuffling data for sort. If it
is not configured, or configured less than 1, then it uses the number of map tasks as reduce
tasks. It is recommended that each reduce task deal with 512MB - 1GB data.
+```
+    OPTIONS( 'SORT_SCOPE'='GLOBAL_SORT', 'GLOBAL_SORT_PARTITIONS'='2')
+```
+   NOTE:
+   - Increasing number of partitions might require increasing spark.driver.maxResultSize
as sampling data collected at driver increases with increasing partitions.
+   - Increasing number of partitions might increase the number of Btree.
+     * LOCAL_SORT: it is the default sort scope.
+	 * NO_SORT: it will load the data in unsorted manner.
+	 
 
 ## SHOW TABLE
 


Mime
View raw message