Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 27311200BEF for ; Wed, 4 Jan 2017 15:51:32 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 257F8160B47; Wed, 4 Jan 2017 14:51:32 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 53669160B44 for ; Wed, 4 Jan 2017 15:51:30 +0100 (CET) Received: (qmail 7495 invoked by uid 500); 4 Jan 2017 14:51:29 -0000 Mailing-List: contact commits-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list commits@carbondata.incubator.apache.org Received: (qmail 7485 invoked by uid 99); 4 Jan 2017 14:51:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jan 2017 14:51:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id D9D31C0974 for ; Wed, 4 Jan 2017 14:51:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.019 X-Spam-Level: X-Spam-Status: No, score=-7.019 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id wyeyBjKn6ZDz for ; Wed, 4 Jan 2017 14:51:00 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id E19D26188D for ; Wed, 4 Jan 2017 14:50:46 +0000 (UTC) Received: (qmail 2481 invoked by uid 99); 4 Jan 2017 14:50:46 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jan 2017 14:50:45 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id E966EDF9E6; Wed, 4 Jan 2017 14:50:45 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: chenliang613@apache.org To: commits@carbondata.incubator.apache.org Date: Wed, 04 Jan 2017 14:51:02 -0000 Message-Id: <19708a290c914b56a6071a5fd7b46c02@git.apache.org> In-Reply-To: <07ee074769a242fcaf5e8fe86b3b051d@git.apache.org> References: <07ee074769a242fcaf5e8fe86b3b051d@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [18/69] incubator-carbondata-site git commit: Document Changes and Optimization archived-at: Wed, 04 Jan 2017 14:51:32 -0000 http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/65061b9a/src/main/webapp/docs/latest/configuring.html ---------------------------------------------------------------------- diff --git a/src/main/webapp/docs/latest/configuring.html b/src/main/webapp/docs/latest/configuring.html index 338d016..51c0e6c 100644 --- a/src/main/webapp/docs/latest/configuring.html +++ b/src/main/webapp/docs/latest/configuring.html @@ -1,30 +1,5 @@ -Untitled Document.md

Configuring CarbonData

This tutorial will guide you through the advance configurations of CarbonData :

This tutorial guides you through the advanced configurations of CarbonData :

System Configuration
Miscellaneous Configuration
Performance Configuration
Miscellaneous Configuration
Spark Configuration

System Configuration

This section provides the details of all the configurations required for Carbon System.
+

This section provides the details of all the configurations required for the CarbonData System.

System Configuration in carbon.properties
- + @@ -71,12 +46,12 @@ under the License. - + - + @@ -86,24 +61,23 @@ under the License. - + - +

Parameter	Property	Default Value	Description
carbon.storelocation	/user/hive/warehouse/carbon.store	Location where Carbon will create the store, and write the data in its own format.NOTE: Store location should be in HDFS.	Location where CarbonData will create the store, and write the data in its own format. NOTE: Store location should be in HDFS.
carbon.ddl.base.hdfs.url	hdfs://hacluster/opt/data	This property is used to configure the HDFS relative path from the HDFS base path, configured in fs.defaultFS. The path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass the complete path while dataload.For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv,the path “hdfs://10.18.101.155:54310” will come from property fs.defaultFS and user can configure the /data/cnbc/ as carbon.ddl.base.hdfs.url.Now while dataload user can specify the csv path as/2016/xyz.csv.	This property is used to configure the HDFS relative path from the HDFS base path, configured in fs.defaultFS. The path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass the complete path while dataload. For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv, the path “hdfs://10.18.101.155:54310” will come from property fs.defaultFS and user can configure the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user can specify the csv path as /2016/xyz.csv.
carbon.badRecords.location
carbon.kettle.home	$SPARK_HOME/carbonlib/carbonplugins	Path used by Carbon internally to create graph for loading the data.	Path used by CarbonData internally to create graph for loading the data.
carbon.data.file.version	2	If this parameter value is set to1, then the Carbon supports the data load which is in old format. If the value is set to 2, then the Carbon supports the data load of new format only.NOTE: The file format created before DataSight Spark V100R002C30 is considered as old format.	If this parameter value is set to 1, then CarbonData will support the data load which is in old format. If the value is set to 2, then CarbonData will support the data load of new format only. NOTE: The file format created before DataSight Spark V100R002C30 is considered as old format.

Performance Configuration

This section provides the details of all the configurations required for Carbon Performance Optimization.
+

This section provides the details of all the configurations required for CarbonData Performance Optimization.

Performance Configuration in carbon.properties

Data Loading Configuration

Data Loading Configuration @@ -117,8 +91,8 @@ under the License. - - + + @@ -129,13 +103,13 @@ under the License. - + - + @@ -147,20 +121,20 @@ under the License. - + - - + + - - + + @@ -176,9 +150,10 @@ under the License.

carbon.sort.file.buffer.size	20	File read buffer size used during sorting.	The value is in MB.Min=1 and Max=100	File read buffer size used during sorting. This value is expressed in MB.	Min=1 and Max=100
carbon.graph.rowset.size
carbon.number.of.cores.while.loading	6	Number of cores to be used while data loading.	Number of cores to be used while loading data.
carbon.sort.size	500000	Record count to sort and write to temp intermediate files.	Record count to sort and write intermediate files to temp.
carbon.number.of.cores.block.sort	7	Number of cores to be used for block sort while dataloading.	Number of cores to use for block sort while loading data.
carbon.max.driver.lru.cache.size	-1	Max LRU cache size upto which data will be loaded at the driver side.	The value is in MB. The default value is -1, means there is no memory limit for caching. Only integer values greater than 0 are accepted.	Max LRU cache size upto which data will be loaded at the driver side. This value is expressed in MB. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted.
carbon.max.executor.lru.cache.size	-1	Max LRU cache size upto which data will be loaded at the executor side.	The value is in MB. The default value is -1, means there is no memory limit for caching. Only integer values greater than 0 are accepted. If this parameter is not configured, then thecarbon.max.driver.lru.cache.size value will be considered.	Max LRU cache size upto which data will be loaded at the executor side. This value is expressed in MB. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. If this parameter is not configured, then the carbon.max.driver.lru.cache.size value will be considered.
carbon.merge.sort.prefetch

Compaction Configuration

+ +

Compaction Configuration + @@ -192,44 +167,43 @@ under the License. - + - - + + - + - + - + - +

carbon.number.of.cores.while.compacting	2	Number of cores which is used to write data during compaction.	Number of cores which are used to write data during compaction.
carbon.compaction.level.threshold	4,3	This property is for minor compaction which decides how many segments to be merged.Example: if it is set as 2,3 then minor compaction will be triggered for every 2 segments. 3 is the number of level 1 compacted segment which is further compacted to new segment.	4, 3	This property is for minor compaction which decides how many segments to be merged. Example: If it is set as 2, 3 then minor compaction will be triggered for every 2 segments. 3 is the number of level 1 compacted segment which is further compacted to new segment.	Valid values are from 0-100.
carbon.major.compaction.size	1024	Major compaction size can be configured using this parameter. Sum of the segments which is below this threshold will be merged. The value is in MB.	Major compaction size can be configured using this parameter. Sum of the segments which is below this threshold will be merged. This value is expressed in MB.
carbon.horizontal.compaction.enable	true	This property is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delta (DELETE/ UPDATE) files becomes more than specified threshold. By default the horizontal compaction is Turned ON but can turn OFF the horizontal compaction by setting the value to false.	This property is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delta (DELETE/ UPDATE) files becomes more than specified threshold.
carbon.horizontal.UPDATE.compaction.threshold	1	This property specifies the threshold limit on number of UPDATE delta files within a segment. In case the number of delta files goes beyond the threshold, the UPDATE delta files within the segment becomes eligible for horizontal compaction and compacted into single UPDATE delta file.	By default the value is set to 1 and can be altered to values between 1 to 10000.	Values between 1 to 10000.
carbon.horizontal.DELETE.compaction.threshold	1	This property specifies the threshold limit on number of DELETE delta files within a block of a segment. In case the number of delta files goes beyond the threshold, the DELETE delta files for the particular block of the segment becomes eligible for horizontal compaction and compacted into single DELETE delta file.	By default the value is set to 1 and can be altered to values between 1 to 10000.	Values between 1 to 10000.

Query Configuration

Query Configuration @@ -266,6 +240,8 @@ under the License.

Miscellaneous Configuration

@@ -273,8 +249,8 @@ under the License.

Time format for CarbonData

Time format for CarbonData + @@ -291,9 +267,9 @@ under the License.

Dataload Configuration

Dataload Configuration + @@ -311,17 +287,17 @@ under the License. - + - + - + @@ -360,9 +336,8 @@ under the License.

carbon.lock.type	LOCALLOCK	This configuration specifies the type of lock to be acquired during concurrent operations on table.There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other Carbon spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple carbon spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking.	This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking.
carbon.sort.intermediate.files.limit	20	Minimum no of intermediate files after which sort merged to be started.	Minimum number of intermediate files after which merged sort can started.
carbon.block.meta.size.reserved.percentage	10	space reserved in percentage for writing block meta data in carbon data file.	Space reserved in percentage for writing block meta data in CarbonData file.
carbon.csv.read.buffersize.byte

Compaction Configuration

Compaction Configuration @@ -375,12 +350,12 @@ under the License. - + - + @@ -389,9 +364,9 @@ under the License.

carbon.numberof.preserve.segments	0	If the user wants to preserve some number of segments from being compacted then he can set this property.Example: carbon.numberof.preserve.segments=2 then 2 latest segments will always be excluded from the compaction. No segments will be preserved by default.	If the user wants to preserve some number of segments from being compacted then he can set this property. Example: carbon.numberof.preserve.segments=2 then 2 latest segments will always be excluded from the compaction. No segments will be preserved by default.
carbon.allowed.compaction.days	0	Compaction will merge the segments which are loaded with in the specific number of days configured.Example: if the configuration is 2, then the segments which are loaded in the time frame of 2 days only will get merged. Segments which are loaded 2 days apart will not be merged.This is disabled by default.	Compaction will merge the segments which are loaded with in the specific number of days configured. Example: If the configuration is 2, then the segments which are loaded in the time frame of 2 days only will get merged. Segments which are loaded 2 days apart will not be merged. This is disabled by default.
carbon.enable.auto.load.merge

Query Configuration

Query Configuration + @@ -413,9 +388,8 @@ under the License.

Global Dictionary Configurations

Global Dictionary Configurations @@ -428,22 +402,22 @@ under the License.

high.cardinality.identify