carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chenliang...@apache.org
Subject [02/20] carbondata-site git commit: modified for 1.5.0 version
Date Wed, 17 Oct 2018 10:14:50 GMT
http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/datamap-developer-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/datamap-developer-guide.md b/src/site/markdown/datamap-developer-guide.md
index 6bac9b5..60f93df 100644
--- a/src/site/markdown/datamap-developer-guide.md
+++ b/src/site/markdown/datamap-developer-guide.md
@@ -6,7 +6,7 @@ Currently, there are two 2 types of DataMap supported:
 1. IndexDataMap: DataMap that leverages index to accelerate filter query
 2. MVDataMap: DataMap that leverages Materialized View to accelerate olap style query, like SPJG query (select, predicate, join, groupby)
 
-### DataMap provider
+### DataMap Provider
 When user issues `CREATE DATAMAP dm ON TABLE main USING 'provider'`, the corresponding DataMapProvider implementation will be created and initialized. 
 Currently, the provider string can be:
 1. preaggregate: A type of MVDataMap that do pre-aggregate of single table
@@ -15,5 +15,5 @@ Currently, the provider string can be:
 
 When user issues `DROP DATAMAP dm ON TABLE main`, the corresponding DataMapProvider interface will be called.
 
-Details about [DataMap Management](./datamap/datamap-management.md#datamap-management) and supported [DSL](./datamap/datamap-management.md#overview) are documented [here](./datamap/datamap-management.md).
+Click for more details about [DataMap Management](./datamap/datamap-management.md#datamap-management) and supported [DSL](./datamap/datamap-management.md#overview).
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/datamap-management.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/datamap-management.md b/src/site/markdown/datamap-management.md
index eee03a7..ad8718a 100644
--- a/src/site/markdown/datamap-management.md
+++ b/src/site/markdown/datamap-management.md
@@ -66,9 +66,9 @@ If user create MV datamap without specifying `WITH DEFERRED REBUILD`, carbondata
 ### Automatic Refresh
 
 When user creates a datamap on the main table without using `WITH DEFERRED REBUILD` syntax, the datamap will be managed by system automatically.
-For every data load to the main table, system will immediately triger a load to the datamap automatically. These two data loading (to main table and datamap) is executed in a transactional manner, meaning that it will be either both success or neither success. 
+For every data load to the main table, system will immediately trigger a load to the datamap automatically. These two data loading (to main table and datamap) is executed in a transactional manner, meaning that it will be either both success or neither success. 
 
-The data loading to datamap is incremental based on Segment concept, avoiding a expesive total rebuild.
+The data loading to datamap is incremental based on Segment concept, avoiding a expensive total rebuild.
 
 If user perform following command on the main table, system will return failure. (reject the operation)
 
@@ -87,7 +87,7 @@ We do recommend you to use this management for index datamap.
 
 ### Manual Refresh
 
-When user creates a datamap specifying maunal refresh semantic, the datamap is created with status *disabled* and query will NOT use this datamap until user can issue REBUILD DATAMAP command to build the datamap. For every REBUILD DATAMAP command, system will trigger a full rebuild of the datamap. After rebuild is done, system will change datamap status to *enabled*, so that it can be used in query rewrite.
+When user creates a datamap specifying manual refresh semantic, the datamap is created with status *disabled* and query will NOT use this datamap until user can issue REBUILD DATAMAP command to build the datamap. For every REBUILD DATAMAP command, system will trigger a full rebuild of the datamap. After rebuild is done, system will change datamap status to *enabled*, so that it can be used in query rewrite.
 
 For every new data loading, data update, delete, the related datamap will be made *disabled*,
 which means that the following queries will not benefit from the datamap before it becomes *enabled* again.
@@ -122,7 +122,7 @@ There is a DataMapCatalog interface to retrieve schema of all datamap, it can be
 
 How can user know whether datamap is used in the query?
 
-User can use EXPLAIN command to know, it will print out something like
+User can set enable.query.statistics = true and use EXPLAIN command to know, it will print out something like
 
 ```text
 == CarbonData Profiler ==

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/ddl-of-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/ddl-of-carbondata.md b/src/site/markdown/ddl-of-carbondata.md
index acaac43..933a448 100644
--- a/src/site/markdown/ddl-of-carbondata.md
+++ b/src/site/markdown/ddl-of-carbondata.md
@@ -32,6 +32,8 @@ CarbonData DDL statements are documented here,which includes:
   * [Caching Level](#caching-at-block-or-blocklet-level)
   * [Hive/Parquet folder Structure](#support-flat-folder-same-as-hiveparquet)
   * [Extra Long String columns](#string-longer-than-32000-characters)
+  * [Compression for Table](#compression-for-table)
+  * [Bad Records Path](#bad-records-path)
 * [CREATE TABLE AS SELECT](#create-table-as-select)
 * [CREATE EXTERNAL TABLE](#create-external-table)
   * [External Table on Transactional table location](#create-external-table-on-managed-table-data-location)
@@ -72,6 +74,7 @@ CarbonData DDL statements are documented here,which includes:
   [TBLPROPERTIES (property_name=property_value, ...)]
   [LOCATION 'path']
   ```
+
   **NOTE:** CarbonData also supports "STORED AS carbondata" and "USING carbondata". Find example code at [CarbonSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala) in the CarbonData repo.
 ### Usage Guidelines
 
@@ -84,19 +87,20 @@ CarbonData DDL statements are documented here,which includes:
 | [SORT_COLUMNS](#sort-columns-configuration)                  | Columns to include in sort and its order of sort             |
 | [SORT_SCOPE](#sort-scope-configuration)                      | Sort scope of the load.Options include no sort, local sort ,batch sort and global sort |
 | [TABLE_BLOCKSIZE](#table-block-size-configuration)           | Size of blocks to write onto hdfs                            |
+| [TABLE_BLOCKLET_SIZE](#table-blocklet-size-configuration)    | Size of blocklet to write in the file                        |
 | [MAJOR_COMPACTION_SIZE](#table-compaction-configuration)     | Size upto which the segments can be combined into one        |
 | [AUTO_LOAD_MERGE](#table-compaction-configuration)           | Whether to auto compact the segments                         |
 | [COMPACTION_LEVEL_THRESHOLD](#table-compaction-configuration) | Number of segments to compact into one segment               |
 | [COMPACTION_PRESERVE_SEGMENTS](#table-compaction-configuration) | Number of latest segments that needs to be excluded from compaction |
 | [ALLOWED_COMPACTION_DAYS](#table-compaction-configuration)   | Segments generated within the configured time limit in days will be compacted, skipping others |
-| [streaming](#streaming)                                      | Whether the table is a streaming table                       |
+| [STREAMING](#streaming)                                      | Whether the table is a streaming table                       |
 | [LOCAL_DICTIONARY_ENABLE](#local-dictionary-configuration)   | Enable local dictionary generation                           |
 | [LOCAL_DICTIONARY_THRESHOLD](#local-dictionary-configuration) | Cardinality upto which the local dictionary can be generated |
-| [LOCAL_DICTIONARY_INCLUDE](#local-dictionary-configuration)  | Columns for which local dictionary needs to be generated.Useful when local dictionary need not be generated for all string/varchar/char columns |
-| [LOCAL_DICTIONARY_EXCLUDE](#local-dictionary-configuration)  | Columns for which local dictionary generation should be skipped.Useful when local dictionary need not be generated for few string/varchar/char columns |
+| [LOCAL_DICTIONARY_INCLUDE](#local-dictionary-configuration)  | Columns for which local dictionary needs to be generated. Useful when local dictionary need not be generated for all string/varchar/char columns |
+| [LOCAL_DICTIONARY_EXCLUDE](#local-dictionary-configuration)  | Columns for which local dictionary generation should be skipped. Useful when local dictionary need not be generated for few string/varchar/char columns |
 | [COLUMN_META_CACHE](#caching-minmax-value-for-required-columns) | Columns whose metadata can be cached in Driver for efficient pruning and improved query performance |
-| [CACHE_LEVEL](#caching-at-block-or-blocklet-level)           | Column metadata caching level.Whether to cache column metadata of block or blocklet |
-| [flat_folder](#support-flat-folder-same-as-hiveparquet)      | Whether to write all the carbondata files in a single folder.Not writing segments folder during incremental load |
+| [CACHE_LEVEL](#caching-at-block-or-blocklet-level)           | Column metadata caching level. Whether to cache column metadata of block or blocklet |
+| [FLAT_FOLDER](#support-flat-folder-same-as-hiveparquet)      | Whether to write all the carbondata files in a single folder.Not writing segments folder during incremental load |
 | [LONG_STRING_COLUMNS](#string-longer-than-32000-characters)  | Columns which are greater than 32K characters                |
 | [BUCKETNUMBER](#bucketing)                                   | Number of buckets to be created                              |
 | [BUCKETCOLUMNS](#bucketing)                                  | Columns which are to be placed in buckets                    |
@@ -110,9 +114,10 @@ CarbonData DDL statements are documented here,which includes:
 
      ```
      TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2')
-	```
-	 NOTE: Dictionary Include/Exclude for complex child columns is not supported.
-	
+     ```
+
+     **NOTE**: Dictionary Include/Exclude for complex child columns is not supported.
+
    - ##### Inverted Index Configuration
 
      By default inverted index is enabled, it might help to improve compression ratio and query speed, especially for low cardinality columns which are in reward position.
@@ -127,7 +132,7 @@ CarbonData DDL statements are documented here,which includes:
      This property is for users to specify which columns belong to the MDK(Multi-Dimensions-Key) index.
      * If users don't specify "SORT_COLUMN" property, by default MDK index be built by using all dimension columns except complex data type column. 
      * If this property is specified but with empty argument, then the table will be loaded without sort.
-	 * This supports only string, date, timestamp, short, int, long, and boolean data types.
+     * This supports only string, date, timestamp, short, int, long, byte and boolean data types.
      Suggested use cases : Only build MDK index for required columns,it might help to improve the data loading performance.
 
      ```
@@ -135,7 +140,8 @@ CarbonData DDL statements are documented here,which includes:
      OR
      TBLPROPERTIES ('SORT_COLUMNS'='')
      ```
-     NOTE: Sort_Columns for Complex datatype columns is not supported.
+
+     **NOTE**: Sort_Columns for Complex datatype columns is not supported.
 
    - ##### Sort Scope Configuration
    
@@ -146,10 +152,10 @@ CarbonData DDL statements are documented here,which includes:
      * BATCH_SORT: It increases the load performance but decreases the query performance if identified blocks > parallelism.
      * GLOBAL_SORT: It increases the query performance, especially high concurrent point query.
        And if you care about loading resources isolation strictly, because the system uses the spark GroupBy to sort data, the resource can be controlled by spark. 
-	
-	### Example:
 
-   ```
+    ### Example:
+
+    ```
     CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
       productNumber INT,
       productName STRING,
@@ -162,19 +168,30 @@ CarbonData DDL statements are documented here,which includes:
     STORED AS carbondata
     TBLPROPERTIES ('SORT_COLUMNS'='productName,storeCity',
                    'SORT_SCOPE'='NO_SORT')
-   ```
+    ```
 
    **NOTE:** CarbonData also supports "using carbondata". Find example code at [SparkSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/SparkSessionExample.scala) in the CarbonData repo.
 
    - ##### Table Block Size Configuration
 
-     This command is for setting block size of this table, the default value is 1024 MB and supports a range of 1 MB to 2048 MB.
+     This property is for setting block size of this table, the default value is 1024 MB and supports a range of 1 MB to 2048 MB.
 
      ```
      TBLPROPERTIES ('TABLE_BLOCKSIZE'='512')
      ```
+
      **NOTE:** 512 or 512M both are accepted.
 
+   - ##### Table Blocklet Size Configuration
+
+     This property is for setting blocklet size in the carbondata file, the default value is 64 MB.
+     Blocklet is the minimum IO read unit, in case of point queries reduce blocklet size might improve the query performance.
+
+     Example usage:
+     ```
+     TBLPROPERTIES ('TABLE_BLOCKLET_SIZE'='8')
+     ```
+
    - ##### Table Compaction Configuration
    
      These properties are table level compaction configurations, if not specified, system level configurations in carbon.properties will be used.
@@ -196,7 +213,7 @@ CarbonData DDL statements are documented here,which includes:
      
    - ##### Streaming
 
-     CarbonData supports streaming ingestion for real-time data. You can create the ‘streaming’ table using the following table properties.
+     CarbonData supports streaming ingestion for real-time data. You can create the 'streaming' table using the following table properties.
 
      ```
      TBLPROPERTIES ('streaming'='true')
@@ -231,7 +248,13 @@ CarbonData DDL statements are documented here,which includes:
    
    * In case of multi-level complex dataType columns, primitive string/varchar/char columns are considered for local dictionary generation.
 
-   Local dictionary will have to be enabled explicitly during create table or by enabling the **system property** ***carbon.local.dictionary.enable***. By default, Local Dictionary will be disabled for the carbondata table.
+   System Level Properties for Local Dictionary: 
+   
+   
+   | Properties | Default value | Description |
+   | ---------- | ------------- | ----------- |
+   | carbon.local.dictionary.enable | false | By default, Local Dictionary will be disabled for the carbondata table. |
+   | carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. **NOTE:** Memory footprint decreases significantly as compared to when this property is set to false |
     
    Local Dictionary can be configured using the following properties during create table command: 
           
@@ -239,24 +262,24 @@ CarbonData DDL statements are documented here,which includes:
 | Properties | Default value | Description |
 | ---------- | ------------- | ----------- |
 | LOCAL_DICTIONARY_ENABLE | false | Whether to enable local dictionary generation. **NOTE:** If this property is defined, it will override the value configured at system level by '***carbon.local.dictionary.enable***'.Local dictionary will be generated for all string/varchar/char columns unless LOCAL_DICTIONARY_INCLUDE, LOCAL_DICTIONARY_EXCLUDE is configured. |
-| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality of a column upto which carbondata can try to generate local dictionary (maximum - 100000) |
-| LOCAL_DICTIONARY_INCLUDE | string/varchar/char columns| Columns for which Local Dictionary has to be generated.**NOTE:** Those string/varchar/char columns which are added into DICTIONARY_INCLUDE option will not be considered for local dictionary generation.This property needs to be configured only when local dictionary needs to be generated for few columns, skipping others.This property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or **carbon.local.dictionary.enable** is true |
-| LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary need not be generated.This property needs to be configured only when local dictionary needs to be skipped for few columns, generating for others.This property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or **carbon.local.dictionary.enable** is true |
+| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality of a column upto which carbondata can try to generate local dictionary (maximum - 100000). **NOTE:** When LOCAL_DICTIONARY_THRESHOLD is defined for Complex columns, the count of distinct records of all child columns are summed up. |
+| LOCAL_DICTIONARY_INCLUDE | string/varchar/char columns| Columns for which Local Dictionary has to be generated.**NOTE:** Those string/varchar/char columns which are added into DICTIONARY_INCLUDE option will not be considered for local dictionary generation. This property needs to be configured only when local dictionary needs to be generated for few columns, skipping others. This property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or **carbon.local.dictionary.enable** is true |
+| LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary need not be generated. This property needs to be configured only when local dictionary needs to be skipped for few columns, generating for others. This property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or **carbon.local.dictionary.enable** is true |
 
    **Fallback behavior:** 
 
    * When the cardinality of a column exceeds the threshold, it triggers a fallback and the generated dictionary will be reverted and data loading will be continued without dictionary encoding.
+   
+   * In case of complex columns, fallback is triggered when the summation value of all child columns' distinct records exceeds the defined LOCAL_DICTIONARY_THRESHOLD value.
 
    **NOTE:** When fallback is triggered, the data loading performance will decrease as encoded data will be discarded and the actual data is written to the temporary sort files.
 
    **Points to be noted:**
 
-   1. Reduce Block size:
+   * Reduce Block size:
    
       Number of Blocks generated is less in case of Local Dictionary as compression ratio is high. This may reduce the number of tasks launched during query, resulting in degradation of query performance if the pruned blocks are less compared to the number of parallel tasks which can be run. So it is recommended to configure smaller block size which in turn generates more number of blocks.
       
-   2. All the page-level data for a blocklet needs to be maintained in memory until all the pages encoded for local dictionary is processed in order to handle fallback. Hence the memory required for local dictionary based table is more and this memory increase is proportional to number of columns. 
-      
 ### Example:
 
    ```
@@ -287,19 +310,19 @@ CarbonData DDL statements are documented here,which includes:
       * If you want no column min/max values to be cached in the driver.
 
       ```
-      COLUMN_META_CACHE=’’
+      COLUMN_META_CACHE=''
       ```
 
       * If you want only col1 min/max values to be cached in the driver.
 
       ```
-      COLUMN_META_CACHE=’col1’
+      COLUMN_META_CACHE='col1'
       ```
 
       * If you want min/max values to be cached in driver for all the specified columns.
 
       ```
-      COLUMN_META_CACHE=’col1,col2,col3,…’
+      COLUMN_META_CACHE='col1,col2,col3,…'
       ```
 
       Columns to be cached can be specified either while creating table or after creation of the table.
@@ -308,13 +331,13 @@ CarbonData DDL statements are documented here,which includes:
       Syntax:
 
       ```
-      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,…) STORED BY ‘carbondata’ TBLPROPERTIES (‘COLUMN_META_CACHE’=’col1,col2,…’)
+      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,…) STORED BY 'carbondata' TBLPROPERTIES ('COLUMN_META_CACHE'='col1,col2,…')
       ```
 
       Example:
 
       ```
-      CREATE TABLE employee (name String, city String, id int) STORED BY ‘carbondata’ TBLPROPERTIES (‘COLUMN_META_CACHE’=’name’)
+      CREATE TABLE employee (name String, city String, id int) STORED BY 'carbondata' TBLPROPERTIES ('COLUMN_META_CACHE'='name')
       ```
 
       After creation of table or on already created tables use the alter table command to configure the columns to be cached.
@@ -322,13 +345,13 @@ CarbonData DDL statements are documented here,which includes:
       Syntax:
 
       ```
-      ALTER TABLE [dbName].tableName SET TBLPROPERTIES (‘COLUMN_META_CACHE’=’col1,col2,…’)
+      ALTER TABLE [dbName].tableName SET TBLPROPERTIES ('COLUMN_META_CACHE'='col1,col2,…')
       ```
 
       Example:
 
       ```
-      ALTER TABLE employee SET TBLPROPERTIES (‘COLUMN_META_CACHE’=’city’)
+      ALTER TABLE employee SET TBLPROPERTIES ('COLUMN_META_CACHE'='city')
       ```
 
    - ##### Caching at Block or Blocklet Level
@@ -340,13 +363,13 @@ CarbonData DDL statements are documented here,which includes:
       *Configuration for caching in driver at Block level (default value).*
 
       ```
-      CACHE_LEVEL= ‘BLOCK’
+      CACHE_LEVEL= 'BLOCK'
       ```
 
       *Configuration for caching in driver at Blocklet level.*
 
       ```
-      CACHE_LEVEL= ‘BLOCKLET’
+      CACHE_LEVEL= 'BLOCKLET'
       ```
 
       Cache level can be specified either while creating table or after creation of the table.
@@ -355,13 +378,13 @@ CarbonData DDL statements are documented here,which includes:
       Syntax:
 
       ```
-      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,…) STORED BY ‘carbondata’ TBLPROPERTIES (‘CACHE_LEVEL’=’Blocklet’)
+      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,…) STORED BY 'carbondata' TBLPROPERTIES ('CACHE_LEVEL'='Blocklet')
       ```
 
       Example:
 
       ```
-      CREATE TABLE employee (name String, city String, id int) STORED BY ‘carbondata’ TBLPROPERTIES (‘CACHE_LEVEL’=’Blocklet’)
+      CREATE TABLE employee (name String, city String, id int) STORED BY 'carbondata' TBLPROPERTIES ('CACHE_LEVEL'='Blocklet')
       ```
 
       After creation of table or on already created tables use the alter table command to configure the cache level.
@@ -369,26 +392,27 @@ CarbonData DDL statements are documented here,which includes:
       Syntax:
 
       ```
-      ALTER TABLE [dbName].tableName SET TBLPROPERTIES (‘CACHE_LEVEL’=’Blocklet’)
+      ALTER TABLE [dbName].tableName SET TBLPROPERTIES ('CACHE_LEVEL'='Blocklet')
       ```
 
       Example:
 
       ```
-      ALTER TABLE employee SET TBLPROPERTIES (‘CACHE_LEVEL’=’Blocklet’)
+      ALTER TABLE employee SET TBLPROPERTIES ('CACHE_LEVEL'='Blocklet')
       ```
 
    - ##### Support Flat folder same as Hive/Parquet
 
-       This feature allows all carbondata and index files to keep directy under tablepath. Currently all carbondata/carbonindex files written under tablepath/Fact/Part0/Segment_NUM folder and it is not same as hive/parquet folder structure. This feature makes all files written will be directly under tablepath, it does not maintain any segment folder structure.This is useful for interoperability between the execution engines and plugin with other execution engines like hive or presto becomes easier.
+       This feature allows all carbondata and index files to keep directy under tablepath. Currently all carbondata/carbonindex files written under tablepath/Fact/Part0/Segment_NUM folder and it is not same as hive/parquet folder structure. This feature makes all files written will be directly under tablepath, it does not maintain any segment folder structure. This is useful for interoperability between the execution engines and plugin with other execution engines like hive or presto becomes easier.
 
        Following table property enables this feature and default value is false.
        ```
         'flat_folder'='true'
        ```
+
        Example:
        ```
-       CREATE TABLE employee (name String, city String, id int) STORED BY ‘carbondata’ TBLPROPERTIES ('flat_folder'='true')
+       CREATE TABLE employee (name String, city String, id int) STORED BY 'carbondata' TBLPROPERTIES ('flat_folder'='true')
        ```
 
    - ##### String longer than 32000 characters
@@ -418,6 +442,41 @@ CarbonData DDL statements are documented here,which includes:
 
      **NOTE:** The LONG_STRING_COLUMNS can only be string/char/varchar columns and cannot be dictionary_include/sort_columns/complex columns.
 
+   - ##### Compression for table
+
+     Data compression is also supported by CarbonData.
+     By default, Snappy is used to compress the data. CarbonData also support ZSTD compressor.
+     User can specify the compressor in the table property:
+
+     ```
+     TBLPROPERTIES('carbon.column.compressor'='snappy')
+     ```
+     or
+     ```
+     TBLPROPERTIES('carbon.column.compressor'='zstd')
+     ```
+     If the compressor is configured, all the data loading and compaction will use that compressor.
+     If the compressor is not configured, the data loading and compaction will use the compressor from current system property.
+     In this scenario, the compressor for each load may differ if the system property is changed each time. This is helpful if you want to change the compressor for a table.
+     The corresponding system property is configured in carbon.properties file as below:
+     ```
+     carbon.column.compressor=snappy
+     ```
+     or
+     ```
+     carbon.column.compressor=zstd
+     ```
+
+   - ##### Bad Records Path
+     This property is used to specify the location where bad records would be written.
+     As the table path remains the same after rename therefore the user can use this property to
+     specify bad records path for the table at the time of creation, so that the same path can
+     be later viewed in table description for reference.
+
+     ```
+       TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'')
+     ```
+
 ## CREATE TABLE AS SELECT
   This function allows user to create a Carbon table from any of the Parquet/Hive/Carbon table. This is beneficial when the user wants to create Carbon table from any other Parquet/Hive table and use the Carbon query engine to query and achieve better query results for cases where Carbon is faster than other file formats. Also this feature can be used for backing up the data.
 
@@ -457,7 +516,7 @@ CarbonData DDL statements are documented here,which includes:
   This function allows user to create external table by specifying location.
   ```
   CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name 
-  STORED AS carbondata LOCATION ‘$FilesPath’
+  STORED AS carbondata LOCATION '$FilesPath'
   ```
 
 ### Create external table on managed table data location.
@@ -510,7 +569,7 @@ CarbonData DDL statements are documented here,which includes:
 
 ### Example
   ```
-  CREATE DATABASE carbon LOCATION “hdfs://name_cluster/dir1/carbonstore”;
+  CREATE DATABASE carbon LOCATION "hdfs://name_cluster/dir1/carbonstore";
   ```
 
 ## TABLE MANAGEMENT  
@@ -568,21 +627,23 @@ CarbonData DDL statements are documented here,which includes:
      ```
      ALTER TABLE carbon ADD COLUMNS (a1 INT, b1 STRING) TBLPROPERTIES('DEFAULT.VALUE.a1'='10')
      ```
-      NOTE: Add Complex datatype columns is not supported.
+      **NOTE:** Add Complex datatype columns is not supported.
 
 Users can specify which columns to include and exclude for local dictionary generation after adding new columns. These will be appended with the already existing local dictionary include and exclude columns of main table respectively.
-  ```
+     ```
      ALTER TABLE carbon ADD COLUMNS (a1 STRING, b1 STRING) TBLPROPERTIES('LOCAL_DICTIONARY_INCLUDE'='a1','LOCAL_DICTIONARY_EXCLUDE'='b1')
-  ```
+     ```
 
    - ##### DROP COLUMNS
    
      This command is used to delete the existing column(s) in a table.
+
      ```
      ALTER TABLE [db_name.]table_name DROP COLUMNS (col_name, ...)
      ```
 
      Examples:
+
      ```
      ALTER TABLE carbon DROP COLUMNS (b1)
      OR
@@ -590,12 +651,14 @@ Users can specify which columns to include and exclude for local dictionary gene
      
      ALTER TABLE carbon DROP COLUMNS (c1,d1)
      ```
-     NOTE: Drop Complex child column is not supported.
+
+     **NOTE:** Drop Complex child column is not supported.
 
    - ##### CHANGE DATA TYPE
    
      This command is used to change the data type from INT to BIGINT or decimal precision from lower to higher.
      Change of decimal data type from lower precision to higher precision will only be supported for cases where there is no data loss.
+
      ```
      ALTER TABLE [db_name.]table_name CHANGE col_name col_name changed_column_type
      ```
@@ -606,25 +669,31 @@ Users can specify which columns to include and exclude for local dictionary gene
      - **NOTE:** The allowed range is 38,38 (precision, scale) and is a valid upper case scenario which is not resulting in data loss.
 
      Example1:Changing data type of column a1 from INT to BIGINT.
+
      ```
      ALTER TABLE test_db.carbon CHANGE a1 a1 BIGINT
      ```
      
      Example2:Changing decimal precision of column a1 from 10 to 18.
+
      ```
      ALTER TABLE test_db.carbon CHANGE a1 a1 DECIMAL(18,2)
      ```
+
 - ##### MERGE INDEX
 
      This command is used to merge all the CarbonData index files (.carbonindex) inside a segment to a single CarbonData index merge file (.carbonindexmerge). This enhances the first query performance.
+
      ```
       ALTER TABLE [db_name.]table_name COMPACT 'SEGMENT_INDEX'
      ```
 
       Examples:
-      ```
+
+     ```
       ALTER TABLE test_db.carbon COMPACT 'SEGMENT_INDEX'
       ```
+
       **NOTE:**
 
       * Merge index is not supported on streaming table.
@@ -694,10 +763,11 @@ Users can specify which columns to include and exclude for local dictionary gene
   ```
   CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
                                 productNumber Int COMMENT 'unique serial number for product')
-  COMMENT “This is table comment”
+  COMMENT "This is table comment"
    STORED AS carbondata
    TBLPROPERTIES ('DICTIONARY_INCLUDE'='productNumber')
   ```
+
   You can also SET and UNSET table comment using ALTER command.
 
   Example to SET table comment:
@@ -743,8 +813,8 @@ Users can specify which columns to include and exclude for local dictionary gene
   PARTITIONED BY (productCategory STRING, productBatch STRING)
   STORED AS carbondata
   ```
-   NOTE: Hive partition is not supported on complex datatype columns.
-		
+   **NOTE:** Hive partition is not supported on complex datatype columns.
+
 
 #### Show Partitions
 
@@ -800,6 +870,7 @@ Users can specify which columns to include and exclude for local dictionary gene
   [TBLPROPERTIES ('PARTITION_TYPE'='HASH',
                   'NUM_PARTITIONS'='N' ...)]
   ```
+
   **NOTE:** N is the number of hash partitions
 
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/dml-of-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/dml-of-carbondata.md b/src/site/markdown/dml-of-carbondata.md
index 42da655..393ebd3 100644
--- a/src/site/markdown/dml-of-carbondata.md
+++ b/src/site/markdown/dml-of-carbondata.md
@@ -46,7 +46,7 @@ CarbonData DML statements are documented here,which includes:
 | ------------------------------------------------------- | ------------------------------------------------------------ |
 | [DELIMITER](#delimiter)                                 | Character used to separate the data in the input csv file    |
 | [QUOTECHAR](#quotechar)                                 | Character used to quote the data in the input csv file       |
-| [COMMENTCHAR](#commentchar)                             | Character used to comment the rows in the input csv file.Those rows will be skipped from processing |
+| [COMMENTCHAR](#commentchar)                             | Character used to comment the rows in the input csv file. Those rows will be skipped from processing |
 | [HEADER](#header)                                       | Whether the input csv files have header row                  |
 | [FILEHEADER](#fileheader)                               | If header is not present in the input csv, what is the column names to be used for data read from input csv |
 | [MULTILINE](#multiline)                                 | Whether a row data can span across multiple lines.           |
@@ -56,12 +56,12 @@ CarbonData DML statements are documented here,which includes:
 | [COMPLEX_DELIMITER_LEVEL_2](#complex_delimiter_level_2) | Ending delimiter for complex type data in input csv file     |
 | [ALL_DICTIONARY_PATH](#all_dictionary_path)             | Path to read the dictionary data from all columns            |
 | [COLUMNDICT](#columndict)                               | Path to read the dictionary data from for particular column  |
-| [DATEFORMAT](#dateformat)                               | Format of date in the input csv file                         |
-| [TIMESTAMPFORMAT](#timestampformat)                     | Format of timestamp in the input csv file                    |
+| [DATEFORMAT](#dateformattimestampformat)                | Format of date in the input csv file                         |
+| [TIMESTAMPFORMAT](#dateformattimestampformat)           | Format of timestamp in the input csv file                    |
 | [SORT_COLUMN_BOUNDS](#sort-column-bounds)               | How to parititon the sort columns to make the evenly distributed |
 | [SINGLE_PASS](#single_pass)                             | When to enable single pass data loading                      |
 | [BAD_RECORDS_LOGGER_ENABLE](#bad-records-handling)      | Whether to enable bad records logging                        |
-| [BAD_RECORD_PATH](#bad-records-handling)                | Bad records logging path.Useful when bad record logging is enabled |
+| [BAD_RECORD_PATH](#bad-records-handling)                | Bad records logging path. Useful when bad record logging is enabled |
 | [BAD_RECORDS_ACTION](#bad-records-handling)             | Behavior of data loading when bad record is found            |
 | [IS_EMPTY_DATA_BAD_RECORD](#bad-records-handling)       | Whether empty data of a column to be considered as bad record or not |
 | [GLOBAL_SORT_PARTITIONS](#global_sort_partitions)       | Number of partition to use for shuffling of data during sorting |
@@ -176,7 +176,7 @@ CarbonData DML statements are documented here,which includes:
 
     Range bounds for sort columns.
 
-    Suppose the table is created with 'SORT_COLUMNS'='name,id' and the range for name is aaa~zzz, the value range for id is 0~1000. Then during data loading, we can specify the following option to enhance data loading performance.
+    Suppose the table is created with 'SORT_COLUMNS'='name,id' and the range for name is aaa to zzz, the value range for id is 0 to 1000. Then during data loading, we can specify the following option to enhance data loading performance.
     ```
     OPTIONS('SORT_COLUMN_BOUNDS'='f,250;l,500;r,750')
     ```
@@ -186,7 +186,7 @@ CarbonData DML statements are documented here,which includes:
     * SORT_COLUMN_BOUNDS will be used only when the SORT_SCOPE is 'local_sort'.
     * Carbondata will use these bounds as ranges to process data concurrently during the final sort percedure. The records will be sorted and written out inside each partition. Since the partition is sorted, all records will be sorted.
     * Since the actual order and literal order of the dictionary column are not necessarily the same, we do not recommend you to use this feature if the first sort column is 'dictionary_include'.
-    * The option works better if your CPU usage during loading is low. If your system is already CPU tense, better not to use this option. Besides, it depends on the user to specify the bounds. If user does not know the exactly bounds to make the data distributed evenly among the bounds, loading performance will still be better than before or at least the same as before.
+    * The option works better if your CPU usage during loading is low. If your current system CPU usage is high, better not to use this option. Besides, it depends on the user to specify the bounds. If user does not know the exactly bounds to make the data distributed evenly among the bounds, loading performance will still be better than before or at least the same as before.
     * Users can find more information about this option in the description of PR1953.
 
   - ##### SINGLE_PASS:
@@ -240,14 +240,6 @@ CarbonData DML statements are documented here,which includes:
   * Since Bad Records Path can be specified in create, load and carbon properties. 
     Therefore, value specified in load will have the highest priority, and value specified in carbon properties will have the least priority.
 
-   **Bad Records Path:**
-         This property is used to specify the location where bad records would be written.
-        
-
-   ```
-   TBLPROPERTIES('BAD_RECORDS_PATH'='/opt/badrecords'')
-   ```
-
   Example:
 
   ```

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/documentation.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/documentation.md b/src/site/markdown/documentation.md
index 537a9d3..1b6726a 100644
--- a/src/site/markdown/documentation.md
+++ b/src/site/markdown/documentation.md
@@ -25,7 +25,7 @@ Apache CarbonData is a new big data file format for faster interactive query usi
 
 ## Getting Started
 
-**File Format Concepts:** Start with the basics of understanding the [CarbonData file format](./file-structure-of-carbondata.md#carbondata-file-format) and its [storage structure](./file-structure-of-carbondata.md).This will help to understand other parts of the documentation, including deployment, programming and usage guides. 
+**File Format Concepts:** Start with the basics of understanding the [CarbonData file format](./file-structure-of-carbondata.md#carbondata-file-format) and its [storage structure](./file-structure-of-carbondata.md). This will help to understand other parts of the documentation, including deployment, programming and usage guides. 
 
 **Quick Start:** [Run an example program](./quick-start-guide.md#installing-and-configuring-carbondata-to-run-locally-with-spark-shell) on your local machine or [study some examples](https://github.com/apache/carbondata/tree/master/examples/spark2/src/main/scala/org/apache/carbondata/examples).
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/faq.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/faq.md b/src/site/markdown/faq.md
index 8ec7290..3ac9a0a 100644
--- a/src/site/markdown/faq.md
+++ b/src/site/markdown/faq.md
@@ -28,6 +28,7 @@
 * [Why aggregate query is not fetching data from aggregate table?](#why-aggregate-query-is-not-fetching-data-from-aggregate-table)
 * [Why all executors are showing success in Spark UI even after Dataload command failed at Driver side?](#why-all-executors-are-showing-success-in-spark-ui-even-after-dataload-command-failed-at-driver-side)
 * [Why different time zone result for select query output when query SDK writer output?](#why-different-time-zone-result-for-select-query-output-when-query-sdk-writer-output)
+* [How to check LRU cache memory footprint?](#how-to-check-lru-cache-memory-footprint)
 
 # TroubleShooting
 
@@ -56,12 +57,12 @@ By default **carbon.badRecords.location** specifies the following location ``/op
 ## How to enable Bad Record Logging?
 While loading data we can specify the approach to handle Bad Records. In order to analyse the cause of the Bad Records the parameter ``BAD_RECORDS_LOGGER_ENABLE`` must be set to value ``TRUE``. There are multiple approaches to handle Bad Records which can be specified  by the parameter ``BAD_RECORDS_ACTION``.
 
-- To pad the incorrect values of the csv rows with NULL value and load the data in CarbonData, set the following in the query :
+- To pass the incorrect values of the csv rows with NULL value and load the data in CarbonData, set the following in the query :
 ```
 'BAD_RECORDS_ACTION'='FORCE'
 ```
 
-- To write the Bad Records without padding incorrect values with NULL in the raw csv (set in the parameter **carbon.badRecords.location**), set the following in the query :
+- To write the Bad Records without passing incorrect values with NULL in the raw csv (set in the parameter **carbon.badRecords.location**), set the following in the query :
 ```
 'BAD_RECORDS_ACTION'='REDIRECT'
 ```
@@ -198,7 +199,7 @@ select cntry,sum(gdp) from gdp21,pop1 where cntry=ctry group by cntry;
 ```
 
 ## Why all executors are showing success in Spark UI even after Dataload command failed at Driver side?
-Spark executor shows task as failed after the maximum number of retry attempts, but loading the data having bad records and BAD_RECORDS_ACTION (carbon.bad.records.action) is set as “FAIL” will attempt only once but will send the signal to driver as failed instead of throwing the exception to retry, as there is no point to retry if bad record found and BAD_RECORDS_ACTION is set to fail. Hence the Spark executor displays this one attempt as successful but the command has actually failed to execute. Task attempts or executor logs can be checked to observe the failure reason.
+Spark executor shows task as failed after the maximum number of retry attempts, but loading the data having bad records and BAD_RECORDS_ACTION (carbon.bad.records.action) is set as "FAIL" will attempt only once but will send the signal to driver as failed instead of throwing the exception to retry, as there is no point to retry if bad record found and BAD_RECORDS_ACTION is set to fail. Hence the Spark executor displays this one attempt as successful but the command has actually failed to execute. Task attempts or executor logs can be checked to observe the failure reason.
 
 ## Why different time zone result for select query output when query SDK writer output? 
 SDK writer is an independent entity, hence SDK writer can generate carbondata files from a non-cluster machine that has different time zones. But at cluster when those files are read, it always takes cluster time-zone. Hence, the value of timestamp and date datatype fields are not original value.
@@ -212,7 +213,22 @@ cluster timezone is Asia/Shanghai
 TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 ```
 
+## How to check LRU cache memory footprint?
+To observe the LRU cache memory footprint in the logs, configure the below properties in log4j.properties file.
+```
+log4j.logger.org.apache.carbondata.core.memory.UnsafeMemoryManager = DEBUG
+log4j.logger.org.apache.carbondata.core.cache.CarbonLRUCache = DEBUG
+```
+These properties will enable the DEBUG log for the CarbonLRUCache and UnsafeMemoryManager which will print the information of memory consumed using which the LRU cache size can be decided. **Note:** Enabling the DEBUG log will degrade the query performance.
 
+**Example:**
+```
+18/09/26 15:05:28 DEBUG UnsafeMemoryManager: pool-44-thread-1 Memory block (org.apache.carbondata.core.memory.MemoryBlock@21312095) is created with size 10. Total memory used 413Bytes, left 536870499Bytes
+18/09/26 15:05:29 DEBUG CarbonLRUCache: main Required size for entry /home/target/store/default/stored_as_carbondata_table/Fact/Part0/Segment_0/0_1537954529044.carbonindexmerge :: 181 Current cache size :: 0
+18/09/26 15:05:30 DEBUG UnsafeMemoryManager: main Freeing memory of size: 105available memory:  536870836
+18/09/26 15:05:30 DEBUG UnsafeMemoryManager: main Freeing memory of size: 76available memory:  536870912
+18/09/26 15:05:30 INFO CarbonLRUCache: main Removed entry from InMemory lru cache :: /home/target/store/default/stored_as_carbondata_table/Fact/Part0/Segment_0/0_1537954529044.carbonindexmerge
+```
 
 ## Getting tablestatus.lock issues When loading data
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/file-structure-of-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/file-structure-of-carbondata.md b/src/site/markdown/file-structure-of-carbondata.md
index 2b43105..8eacd38 100644
--- a/src/site/markdown/file-structure-of-carbondata.md
+++ b/src/site/markdown/file-structure-of-carbondata.md
@@ -48,7 +48,7 @@ The CarbonData files are stored in the location specified by the ***carbon.store
 
 ![File Directory Structure](../../src/site/images/2-1_1.png)
 
-1. ModifiedTime.mdt records the timestamp of the metadata with the modification time attribute of the file. When the drop table and create table are used, the modification time of the file is updated.This is common to all databases and hence is kept in parallel to databases
+1. ModifiedTime.mdt records the timestamp of the metadata with the modification time attribute of the file. When the drop table and create table are used, the modification time of the file is updated. This is common to all databases and hence is kept in parallel to databases
 2. The **default** is the database name and contains the user tables.default is used when user doesn't specify any database name;else user configured database name will be the directory name. user_table is the table name.
 3. Metadata directory stores schema files, tablestatus and dictionary files (including .dict, .dictmeta and .sortindex). There are three types of metadata data information files.
 4. data and index files are stored under directory named **Fact**. The Fact directory has a Part0 partition directory, where 0 is the partition number.

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/how-to-contribute-to-apache-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/how-to-contribute-to-apache-carbondata.md b/src/site/markdown/how-to-contribute-to-apache-carbondata.md
index f64c948..8d6c891 100644
--- a/src/site/markdown/how-to-contribute-to-apache-carbondata.md
+++ b/src/site/markdown/how-to-contribute-to-apache-carbondata.md
@@ -48,7 +48,7 @@ alternatively, on the developer mailing list(dev@carbondata.apache.org).
 
 If there’s an existing JIRA issue for your intended contribution, please comment about your
 intended work. Once the work is understood, a committer will assign the issue to you.
-(If you don’t have a JIRA role yet, you’ll be added to the “contributor” role.) If an issue is
+(If you don’t have a JIRA role yet, you’ll be added to the "contributor" role.) If an issue is
 currently assigned, please check with the current assignee before reassigning.
 
 For moderate or large contributions, you should not start coding or writing a design doc unless
@@ -171,7 +171,7 @@ Our GitHub mirror automatically provides pre-commit testing coverage using Jenki
 Please make sure those tests pass,the contribution cannot be merged otherwise.
 
 #### LGTM
-Once the reviewer is happy with the change, they’ll respond with an LGTM (“looks good to me!”).
+Once the reviewer is happy with the change, they’ll respond with an LGTM ("looks good to me!").
 At this point, the committer will take over, possibly make some additional touch ups,
 and merge your changes into the codebase.
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/introduction.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/introduction.md b/src/site/markdown/introduction.md
index 434ccfa..e6c3372 100644
--- a/src/site/markdown/introduction.md
+++ b/src/site/markdown/introduction.md
@@ -18,15 +18,15 @@ CarbonData has
 
 ## CarbonData Features & Functions
 
-CarbonData has rich set of featues to support various use cases in Big Data analytics.The below table lists the major features supported by CarbonData.
+CarbonData has rich set of features to support various use cases in Big Data analytics. The below table lists the major features supported by CarbonData.
 
 
 
 ### Table Management
 
 - ##### DDL (Create, Alter,Drop,CTAS)
-
-​	CarbonData provides its own DDL to create and manage carbondata tables.These DDL conform to 			Hive,Spark SQL format and support additional properties and configuration to take advantages of CarbonData functionalities.
+  
+  CarbonData provides its own DDL to create and manage carbondata tables. These DDL conform to Hive,Spark SQL format and support additional properties and configuration to take advantages of CarbonData functionalities.
 
 - ##### DML(Load,Insert)
 
@@ -46,7 +46,7 @@ CarbonData has rich set of featues to support various use cases in Big Data anal
 
 - ##### Compaction
 
-  CarbonData manages incremental loads as segments.Compaction help to compact the growing number of segments and also to improve query filter pruning.
+  CarbonData manages incremental loads as segments. Compaction helps to compact the growing number of segments and also to improve query filter pruning.
 
 - ##### External Tables
 
@@ -56,11 +56,11 @@ CarbonData has rich set of featues to support various use cases in Big Data anal
 
 - ##### Pre-Aggregate
 
-  CarbonData has concept of datamaps to assist in pruning of data while querying so that performance is faster.Pre Aggregate tables are kind of datamaps which can improve the query performance by order of magnitude.CarbonData will automatically pre-aggregae the incremental data and re-write the query to automatically fetch from the most appropriate pre-aggregate table to serve the query faster.
+  CarbonData has concept of datamaps to assist in pruning of data while querying so that performance is faster.Pre Aggregate tables are kind of datamaps which can improve the query performance by order of magnitude.CarbonData will automatically pre-aggregate the incremental data and re-write the query to automatically fetch from the most appropriate pre-aggregate table to serve the query faster.
 
 - ##### Time Series
 
-  CarbonData has built in understanding of time order(Year, month,day,hour, minute,second).Time series is a pre-aggregate table which can automatically roll-up the data to the desired level during incremental load and serve the query from the most appropriate pre-aggregate table.
+  CarbonData has built in understanding of time order(Year, month,day,hour, minute,second). Time series is a pre-aggregate table which can automatically roll-up the data to the desired level during incremental load and serve the query from the most appropriate pre-aggregate table.
 
 - ##### Bloom filter
 
@@ -72,7 +72,7 @@ CarbonData has rich set of featues to support various use cases in Big Data anal
 
 - ##### MV (Materialized Views)
 
-  MVs are kind of pre-aggregate tables which can support efficent query re-write and processing.CarbonData provides MV which can rewrite query to fetch from any table(including non-carbondata tables).Typical usecase is to store the aggregated data of a non-carbondata fact table into carbondata and use mv to rewrite the query to fetch from carbondata.
+  MVs are kind of pre-aggregate tables which can support efficent query re-write and processing.CarbonData provides MV which can rewrite query to fetch from any table(including non-carbondata tables). Typical usecase is to store the aggregated data of a non-carbondata fact table into carbondata and use mv to rewrite the query to fetch from carbondata.
 
 ### Streaming
 
@@ -84,17 +84,17 @@ CarbonData has rich set of featues to support various use cases in Big Data anal
 
 - ##### CarbonData writer
 
-  CarbonData supports writing data from non-spark application using SDK.Users can use SDK to generate carbondata files from custom applications.Typical usecase is to write the streaming application plugged in to kafka and use carbondata as sink(target) table for storing.
+  CarbonData supports writing data from non-spark application using SDK.Users can use SDK to generate carbondata files from custom applications. Typical usecase is to write the streaming application plugged in to kafka and use carbondata as sink(target) table for storing.
 
 - ##### CarbonData reader
 
-  CarbonData supports reading of data from non-spark application using SDK.Users can use the SDK to read the carbondata files from their application and do custom processing.
+  CarbonData supports reading of data from non-spark application using SDK. Users can use the SDK to read the carbondata files from their application and do custom processing.
 
 ### Storage
 
 - ##### S3
 
-  CarbonData can write to S3, OBS or any cloud storage confirming to S3 protocol.CarbonData uses the HDFS api to write to cloud object stores.
+  CarbonData can write to S3, OBS or any cloud storage confirming to S3 protocol. CarbonData uses the HDFS api to write to cloud object stores.
 
 - ##### HDFS
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/language-manual.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/language-manual.md b/src/site/markdown/language-manual.md
index 123cae3..79aad00 100644
--- a/src/site/markdown/language-manual.md
+++ b/src/site/markdown/language-manual.md
@@ -32,8 +32,10 @@ CarbonData has its own parser, in addition to Spark's SQL Parser, to parse and p
   - Materialized Views (MV)
   - [Streaming](./streaming-guide.md)
 - Data Manipulation Statements
-  - [DML:](./dml-of-carbondata.md) [Load](./dml-of-carbondata.md#load-data), [Insert](./ddl-of-carbondata.md#insert-overwrite), [Update](./dml-of-carbondata.md#update), [Delete](./dml-of-carbondata.md#delete)
+  - [DML:](./dml-of-carbondata.md) [Load](./dml-of-carbondata.md#load-data), [Insert](./dml-of-carbondata.md#insert-data-into-carbondata-table), [Update](./dml-of-carbondata.md#update), [Delete](./dml-of-carbondata.md#delete)
   - [Segment Management](./segment-management-on-carbondata.md)
+- [CarbonData as Spark's Datasource](./carbon-as-spark-datasource-guide.md)
 - [Configuration Properties](./configuration-parameters.md)
 
 
+

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/lucene-datamap-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/lucene-datamap-guide.md b/src/site/markdown/lucene-datamap-guide.md
index 86b00e2..aa9c8d4 100644
--- a/src/site/markdown/lucene-datamap-guide.md
+++ b/src/site/markdown/lucene-datamap-guide.md
@@ -47,7 +47,7 @@ It will show all DataMaps created on main table.
 
 ## Lucene DataMap Introduction
   Lucene is a high performance, full featured text search engine. Lucene is integrated to carbon as
-  an index datamap and managed along with main tables by CarbonData.User can create lucene datamap 
+  an index datamap and managed along with main tables by CarbonData. User can create lucene datamap 
   to improve query performance on string columns which has content of more length. So, user can 
   search tokenized word or pattern of it using lucene query on text content.
   
@@ -95,7 +95,7 @@ As a technique for query acceleration, Lucene indexes cannot be queried directly
 Queries are to be made on main table. when a query with TEXT_MATCH('name:c10') or 
 TEXT_MATCH_WITH_LIMIT('name:n10',10)[the second parameter represents the number of result to be 
 returned, if user does not specify this value, all results will be returned without any limit] is 
-fired, two jobs are fired.The first job writes the temporary files in folder created at table level 
+fired, two jobs are fired. The first job writes the temporary files in folder created at table level 
 which contains lucene's seach results and these files will be read in second job to give faster 
 results. These temporary files will be cleared once the query finishes.
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/performance-tuning.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/performance-tuning.md b/src/site/markdown/performance-tuning.md
index f56a63b..6c87ce9 100644
--- a/src/site/markdown/performance-tuning.md
+++ b/src/site/markdown/performance-tuning.md
@@ -140,7 +140,7 @@
 
 | Parameter | Default Value | Description/Tuning |
 |-----------|-------------|--------|
-|carbon.number.of.cores.while.loading|Default: 2.This value should be >= 2|Specifies the number of cores used for data processing during data loading in CarbonData. |
+|carbon.number.of.cores.while.loading|Default: 2. This value should be >= 2|Specifies the number of cores used for data processing during data loading in CarbonData. |
 |carbon.sort.size|Default: 100000. The value should be >= 100.|Threshold to write local file in sort step when loading data|
 |carbon.sort.file.write.buffer.size|Default:  50000.|DataOutputStream buffer. |
 |carbon.merge.sort.reader.thread|Default: 3 |Specifies the number of cores used for temp file merging during data loading in CarbonData.|
@@ -165,15 +165,15 @@
 |----------------------------------------------|-----------------------------------|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | carbon.sort.intermediate.files.limit | spark/carbonlib/carbon.properties | Data loading | During the loading of data, local temp is used to sort the data. This number specifies the minimum number of intermediate files after which the  merge sort has to be initiated. | Increasing the parameter to a higher value will improve the load performance. For example, when we increase the value from 20 to 100, it increases the data load performance from 35MB/S to more than 50MB/S. Higher values of this parameter consumes  more memory during the load. |
 | carbon.number.of.cores.while.loading | spark/carbonlib/carbon.properties | Data loading | Specifies the number of cores used for data processing during data loading in CarbonData. | If you have more number of CPUs, then you can increase the number of CPUs, which will increase the performance. For example if we increase the value from 2 to 4 then the CSV reading performance can increase about 1 times |
-| carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data loading and Querying | For minor compaction, specifies the number of segments to be merged in stage 1 and number of compacted segments to be merged in stage 2. | Each CarbonData load will create one segment, if every load is small in size it will generate many small file over a period of time impacting the query performance. Configuring this parameter will merge the small segment to one big segment which will sort the data and improve the performance. For Example in one telecommunication scenario, the performance improves about 2 times after minor compaction. |
+| carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data loading and Querying | For minor compaction, specifies the number of segments to be merged in stage 1 and number of compacted segments to be merged in stage 2. | Each CarbonData load will create one segment, if every load is small in size it will generate many small files over a period of time impacting the query performance. Configuring this parameter will merge the small segment to one big segment which will sort the data and improve the performance. For Example in one telecommunication scenario, the performance improves about 2 times after minor compaction. |
 | spark.sql.shuffle.partitions | spark/conf/spark-defaults.conf | Querying | The number of task started when spark shuffle. | The value can be 1 to 2 times as much as the executor cores. In an aggregation scenario, reducing the number from 200 to 32 reduced the query time from 17 to 9 seconds. |
 | spark.executor.instances/spark.executor.cores/spark.executor.memory | spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, and memory used for CarbonData query. | In the bank scenario, we provide the 4 CPUs cores and 15 GB for each executor which can get good performance. This 2 value does not mean more the better. It needs to be configured properly in case of limited resources. For example, In the bank scenario, it has enough CPU 32 cores each node but less memory 64 GB each node. So we cannot give more CPU but less memory. For example, when 4 cores and 12GB for each executor. It sometimes happens GC during the query which impact the query performance very much from the 3 second to more than 15 seconds. In this scenario need to increase the memory or decrease the CPU cores. |
 | carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading | The buffer size to store records, returned from the block scan. | In limit scenario this parameter is very important. For example your query limit is 1000. But if we set this value to 3000 that means we get 3000 records from scan but spark will only take 1000 rows. So the 2000 remaining are useless. In one Finance test case after we set it to 100, in the limit 1000 scenario the performance increase about 2 times in comparison to if we set this value to 12000. |
 | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether use YARN local directories for multi-table load disk load balance | If this is set it to true CarbonData will use YARN local directories for multi-table load disk load balance, that will improve the data load performance. |
 | carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data loading | Whether to use multiple YARN local directories during table data loading for disk load balance | After enabling 'carbon.use.local.dir', if this is set to true, CarbonData will use all YARN local directories during data load for disk load balance, that will improve the data load performance. Please enable this property when you encounter disk hotspot problem during data loading. |
-| carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data loading | Specify the name of compressor to compress the intermediate sort temporary files during sort procedure in data loading. | The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty means that Carbondata will not compress the sort temp files. This parameter will be useful if you encounter disk bottleneck. |
-| carbon.load.skewedDataOptimization.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable size based block allocation strategy for data loading. | When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data -- It's useful if the size of your input data files varies widely, say 1MB~1GB. |
-| carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable node minumun input data size allocation strategy for data loading.| When loading, carbondata will use node minumun input data size allocation strategy for task distribution. It will make sure the node load the minimum amount of data -- It's useful if the size of your input data files very small, say 1MB~256MB,Avoid generating a large number of small files. |
+| carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data loading | Specify the name of compressor to compress the intermediate sort temporary files during sort procedure in data loading. | The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD', and empty. By default, empty means that Carbondata will not compress the sort temp files. This parameter will be useful if you encounter disk bottleneck. |
+| carbon.load.skewedDataOptimization.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable size based block allocation strategy for data loading. | When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data -- It's useful if the size of your input data files varies widely, say 1MB to 1GB. |
+| carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable node minumun input data size allocation strategy for data loading.| When loading, carbondata will use node minumun input data size allocation strategy for task distribution. It will make sure the nodes load the minimum amount of data -- It's useful if the size of your input data files very small, say 1MB to 256MB,Avoid generating a large number of small files. |
 
   Note: If your CarbonData instance is provided only for query, you may specify the property 'spark.speculation=true' which is in conf directory of spark.
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/preaggregate-datamap-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/preaggregate-datamap-guide.md b/src/site/markdown/preaggregate-datamap-guide.md
index 3a3efc2..eff601d 100644
--- a/src/site/markdown/preaggregate-datamap-guide.md
+++ b/src/site/markdown/preaggregate-datamap-guide.md
@@ -251,7 +251,7 @@ pre-aggregate tables. To further improve the query performance, compaction on pr
 can be triggered to merge the segments and files in the pre-aggregate tables. 
 
 ## Data Management with pre-aggregate tables
-In current implementation, data consistence need to be maintained for both main table and pre-aggregate
+In current implementation, data consistency needs to be maintained for both main table and pre-aggregate
 tables. Once there is pre-aggregate table created on the main table, following command on the main 
 table
 is not supported:

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/quick-start-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/quick-start-guide.md b/src/site/markdown/quick-start-guide.md
index 37c398c..fd535ae 100644
--- a/src/site/markdown/quick-start-guide.md
+++ b/src/site/markdown/quick-start-guide.md
@@ -16,7 +16,7 @@
 -->
 
 # Quick Start
-This tutorial provides a quick introduction to using CarbonData.To follow along with this guide, first download a packaged release of CarbonData from the [CarbonData website](https://dist.apache.org/repos/dist/release/carbondata/).Alternatively it can be created following [Building CarbonData](https://github.com/apache/carbondata/tree/master/build) steps.
+This tutorial provides a quick introduction to using CarbonData. To follow along with this guide, first download a packaged release of CarbonData from the [CarbonData website](https://dist.apache.org/repos/dist/release/carbondata/).Alternatively it can be created following [Building CarbonData](https://github.com/apache/carbondata/tree/master/build) steps.
 
 ##  Prerequisites
 * CarbonData supports Spark versions upto 2.2.1.Please download Spark package from [Spark website](https://spark.apache.org/downloads.html)
@@ -35,7 +35,7 @@ This tutorial provides a quick introduction to using CarbonData.To follow along
 
 ## Integration
 
-CarbonData can be integrated with Spark and Presto Execution Engines.The below documentation guides on Installing and Configuring with these execution engines.
+CarbonData can be integrated with Spark and Presto Execution Engines. The below documentation guides on Installing and Configuring with these execution engines.
 
 ### Spark
 
@@ -293,31 +293,31 @@ hdfs://<host_name>:port/user/hive/warehouse/carbon.store
 
 ## Installing and Configuring CarbonData on Presto
 
-**NOTE:** **CarbonData tables cannot be created nor loaded from Presto.User need to create CarbonData Table and load data into it
+**NOTE:** **CarbonData tables cannot be created nor loaded from Presto. User need to create CarbonData Table and load data into it
 either with [Spark](#installing-and-configuring-carbondata-to-run-locally-with-spark-shell) or [SDK](./sdk-guide.md).
 Once the table is created,it can be queried from Presto.**
 
 
 ### Installing Presto
 
- 1. Download the 0.187 version of Presto using:
-    `wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz`
+ 1. Download the 0.210 version of Presto using:
+    `wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.210/presto-server-0.210.tar.gz`
 
- 2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz`.
+ 2. Extract Presto tar file: `tar zxvf presto-server-0.210.tar.gz`.
 
  3. Download the Presto CLI for the coordinator and name it presto.
 
   ```
-    wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar
+    wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.210/presto-cli-0.210-executable.jar
 
-    mv presto-cli-0.187-executable.jar presto
+    mv presto-cli-0.210-executable.jar presto
 
     chmod +x presto
   ```
 
 ### Create Configuration Files
 
-  1. Create `etc` folder in presto-server-0.187 directory.
+  1. Create `etc` folder in presto-server-0.210 directory.
   2. Create `config.properties`, `jvm.config`, `log.properties`, and `node.properties` files.
   3. Install uuid to generate a node.id.
 
@@ -363,10 +363,15 @@ Once the table is created,it can be queried from Presto.**
   coordinator=true
   node-scheduler.include-coordinator=false
   http-server.http.port=8086
-  query.max-memory=50GB
-  query.max-memory-per-node=2GB
+  query.max-memory=5GB
+  query.max-total-memory-per-node=5GB
+  query.max-memory-per-node=3GB
+  memory.heap-headroom-per-node=1GB
   discovery-server.enabled=true
-  discovery.uri=<coordinator_ip>:8086
+  discovery.uri=http://localhost:8086
+  task.max-worker-threads=4
+  optimizer.dictionary-aggregation=true
+  optimizer.optimize-hash-generation = false
   ```
 The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers.
 
@@ -383,7 +388,7 @@ Then, `query.max-memory=<30GB * number of nodes>`.
   ```
   coordinator=false
   http-server.http.port=8086
-  query.max-memory=50GB
+  query.max-memory=5GB
   query.max-memory-per-node=2GB
   discovery.uri=<coordinator_ip>:8086
   ```
@@ -405,12 +410,12 @@ Then, `query.max-memory=<30GB * number of nodes>`.
 ### Start Presto Server on all nodes
 
 ```
-./presto-server-0.187/bin/launcher start
+./presto-server-0.210/bin/launcher start
 ```
 To run it as a background process.
 
 ```
-./presto-server-0.187/bin/launcher run
+./presto-server-0.210/bin/launcher run
 ```
 To run it in foreground.
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/s3-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/s3-guide.md b/src/site/markdown/s3-guide.md
index a2e5f07..1121164 100644
--- a/src/site/markdown/s3-guide.md
+++ b/src/site/markdown/s3-guide.md
@@ -15,7 +15,7 @@
     limitations under the License.
 -->
 
-# S3 Guide (Alpha Feature 1.4.1)
+# S3 Guide
 
 Object storage is the recommended storage format in cloud as it can support storing large data 
 files. S3 APIs are widely used for accessing object stores. This can be 


Mime
View raw message