From commits-return-12775-archive-asf-public=cust-asf.ponee.io@carbondata.apache.org  Tue Aug  7 15:09:33 2018
Return-Path: <commits-return-12775-archive-asf-public=cust-asf.ponee.io@carbondata.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id CB97A1807A5
	for <archive-asf-public@cust-asf.ponee.io>; Tue,  7 Aug 2018 15:09:32 +0200 (CEST)
Received: (qmail 52715 invoked by uid 500); 7 Aug 2018 13:09:27 -0000
Mailing-List: contact commits-help@carbondata.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:commits-help@carbondata.apache.org>
List-Unsubscribe: <mailto:commits-unsubscribe@carbondata.apache.org>
List-Post: <mailto:commits@carbondata.apache.org>
List-Id: <commits.carbondata.apache.org>
Reply-To: dev@carbondata.apache.org
Delivered-To: mailing list commits@carbondata.apache.org
Received: (qmail 52323 invoked by uid 99); 7 Aug 2018 13:09:26 -0000
Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Aug 2018 13:09:26 +0000
Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33)
	id CE410E118D; Tue,  7 Aug 2018 13:09:25 +0000 (UTC)
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: jackylk@apache.org
To: commits@carbondata.apache.org
Date: Tue, 07 Aug 2018 13:09:48 -0000
Message-Id: <3f87e594b5244eefb9beb44fc449ff4d@git.apache.org>
In-Reply-To: <b1ad24352de744038cdfc5e292d3b32b@git.apache.org>
References: <b1ad24352de744038cdfc5e292d3b32b@git.apache.org>
X-Mailer: ASF-Git Admin Mailer
Subject: [24/50] [abbrv] carbondata git commit: [CARBONDATA-2800][Doc] Add
 useful tips about bloomfilter datamap

[CARBONDATA-2800][Doc] Add useful tips about bloomfilter datamap

add useful tips about bloomfilter datamap

This closes #2581


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/a302cd1c
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/a302cd1c
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/a302cd1c

Branch: refs/heads/external-format
Commit: a302cd1cef6c48c667eacea01df5cb6c75e7685f
Parents: f9b02a5
Author: xuchuanyin <xuchuanyin@hust.edu.cn>
Authored: Mon Jul 30 20:32:05 2018 +0800
Committer: Jacky Li <jacky.likun@qq.com>
Committed: Wed Aug 1 22:10:59 2018 +0800

----------------------------------------------------------------------
 docs/datamap/bloomfilter-datamap-guide.md | 27 +++++++++++++++++++++++++-
 docs/useful-tips-on-carbondata.md         |  4 ++++
 2 files changed, 30 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/a302cd1c/docs/datamap/bloomfilter-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/bloomfilter-datamap-guide.md b/docs/datamap/bloomfilter-datamap-guide.md
index 2dba3dc..8955cde 100644
--- a/docs/datamap/bloomfilter-datamap-guide.md
+++ b/docs/datamap/bloomfilter-datamap-guide.md
@@ -5,6 +5,7 @@
 * [Loading Data](#loading-data)
 * [Querying Data](#querying-data)
 * [Data Management](#data-management-with-bloomfilter-datamap)
+* [Useful Tips](#useful-tips)
 
 #### DataMap Management
 Creating BloomFilter DataMap
@@ -102,4 +103,28 @@ which will show the transformed logical plan, and thus user can check whether th
 If the datamap does not prune blocklets well, you can try to increase the value of property `BLOOM_SIZE` and decrease the value of property `BLOOM_FPP`.
 
 ## Data Management With BloomFilter DataMap
-Data management with BloomFilter datamap has no difference with that on Lucene datamap. You can refer to the corresponding section in `CarbonData BloomFilter DataMap`.
+Data management with BloomFilter datamap has no difference with that on Lucene datamap.
+You can refer to the corresponding section in `CarbonData Lucene DataMap`.
+
+## Useful Tips
++ BloomFilter DataMap is suggested to be created on the high cardinality columns.
+ Query conditions on these columns are always simple `equal` or `in`,
+ such as 'col1=XX', 'col1 in (XX, YY)'.
++ We can create multiple BloomFilter datamaps on one table,
+ but we do recommend you to create one BloomFilter datamap that contains multiple index columns,
+ because the data loading and query performance will be better.
++ `BLOOM_FPP` is only the expected number from user, the actually FPP may be worse.
+ If the BloomFilter datamap does not work well,
+ you can try to increase `BLOOM_SIZE` and decrease `BLOOM_FPP` at the same time.
+ Notice that bigger `BLOOM_SIZE` will increase the size of index file
+ and smaller `BLOOM_FPP` will increase runtime calculation while performing query.
++ '0' skipped blocklets of BloomFilter datamap in explain output indicates that
+ BloomFilter datamap does not prune better than Main datamap.
+ (For example since the data is not ordered, a specific value may be contained in many blocklets. In this case, bloom may not work better than Main DataMap.)
+ If this occurs very often, it means that current BloomFilter is useless. You can disable or drop it.
+ Sometimes we cannot see any pruning result about BloomFilter datamap in the explain output,
+ this indicates that the previous datamap has pruned all the blocklets and there is no need to continue pruning.
++ In some scenarios, the BloomFilter datamap may not enhance the query performance significantly
+ but if it can reduce the number of spark task,
+ there is still a chance that BloomFilter datamap can enhance the performance for concurrent query.
++ Note that BloomFilter datamap will decrease the data loading performance and may cause slightly storage expansion (for datamap index file).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/carbondata/blob/a302cd1c/docs/useful-tips-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/useful-tips-on-carbondata.md b/docs/useful-tips-on-carbondata.md
index d00f785..b4e3bd3 100644
--- a/docs/useful-tips-on-carbondata.md
+++ b/docs/useful-tips-on-carbondata.md
@@ -125,6 +125,10 @@
     TBLPROPERTIES ('SORT_COLUMNS'='Dime_1, HOST, MSISDN')
   ```
 
+  **NOTE:**
+  + BloomFilter can be created to enhance performance for queries with precise equal/in conditions. You can find more information about it in BloomFilter datamap [document](https://github.com/apache/carbondata/blob/master/docs/datamap/bloomfilter-datamap-guide.md).
+
+
 ## Configuration for Optimizing Data Loading performance for Massive Data