carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kunalkap...@apache.org
Subject carbondata git commit: [Documentation] Editorial review
Date Fri, 30 Nov 2018 12:09:42 GMT
Repository: carbondata
Updated Branches:
  refs/heads/master c55279c5c -> 4705d1a20


[Documentation] Editorial review

Corrected spelling mistakes and grammer

This closes #2965


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/4705d1a2
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/4705d1a2
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/4705d1a2

Branch: refs/heads/master
Commit: 4705d1a20ac594ac115e7dc189fb80c633ec2e9b
Parents: c55279c
Author: sgururajshetty <sgururajshetty@gmail.com>
Authored: Thu Nov 29 18:44:22 2018 +0530
Committer: kunal642 <kunalkapoor642@gmail.com>
Committed: Fri Nov 30 17:38:53 2018 +0530

----------------------------------------------------------------------
 docs/configuration-parameters.md     | 4 ++--
 docs/ddl-of-carbondata.md            | 4 ++--
 docs/dml-of-carbondata.md            | 6 +++---
 docs/file-structure-of-carbondata.md | 3 +--
 4 files changed, 8 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/4705d1a2/docs/configuration-parameters.md
----------------------------------------------------------------------
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index a41a3d5..4aa2929 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -69,9 +69,9 @@ This section provides the details of all the configurations required for
the Car
 | carbon.options.bad.records.logger.enable | false | CarbonData can identify the records
that are not conformant to schema and isolate them as bad records. Enabling this configuration
will make CarbonData to log such bad records. **NOTE:** If the input data contains many bad
records, logging them will slow down the over all data loading throughput. The data load operation
status would depend on the configuration in ***carbon.bad.records.action***. |
 | carbon.bad.records.action | FAIL | CarbonData in addition to identifying the bad records,
can take certain actions on such data. This configuration can have four types of actions for
bad records namely FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects
the data by storing the bad records as NULL. If set to REDIRECT then bad records are written
to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded
nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are
found. |
 | carbon.options.is.empty.data.bad.record | false | Based on the business scenarios, empty(""
or '' or ,,) data can be valid or invalid. This configuration controls how empty data should
be treated by CarbonData. If false, then empty ("" or '' or ,,) data will not be considered
as bad record and vice versa. |
-| carbon.options.bad.record.path | (none) | Specifies the HDFS path where bad records are
to be stored. By default the value is Null. This path must to be configured by the user if
***carbon.options.bad.records.logger.enable*** is **true** or ***carbon.bad.records.action***
is **REDIRECT**. |
+| carbon.options.bad.record.path | (none) | Specifies the HDFS path where bad records are
to be stored. By default the value is Null. This path must be configured by the user if ***carbon.options.bad.records.logger.enable***
is **true** or ***carbon.bad.records.action*** is **REDIRECT**. |
 | carbon.blockletgroup.size.in.mb | 64 | Please refer to [file-structure-of-carbondata](./file-structure-of-carbondata.md#carbondata-file-format)
to understand the storage format of CarbonData. The data are read as a group of blocklets
which are called blocklet groups. This parameter specifies the size of each blocklet group.
Higher value results in better sequential IO access. The minimum value is 16MB, any value
lesser than 16MB will reset to the default value (64MB). **NOTE:** Configuring a higher value
might lead to poor performance as an entire blocklet group will have to read into memory before
processing. For filter queries with limit, it is **not advisable** to have a bigger blocklet
size. For aggregation queries which need to return more number of rows, bigger blocklet size
is advisable. |
-| carbon.sort.file.write.buffer.size | 16384 | CarbonData sorts and writes data to intermediate
files to limit the memory usage. This configuration determines the buffer size to be used
for reading and writing such files. **NOTE:** This configuration is useful to tune IO and
derive optimal performance. Based on the OS and underlying harddisk type, these values can
significantly affect the overall performance. It is ideal to tune the buffersize equivalent
to the IO buffer size of the OS. Recommended range is between 10240 and 10485760 bytes. |
+| carbon.sort.file.write.buffer.size | 16384 | CarbonData sorts and writes data to intermediate
files to limit the memory usage. This configuration determines the buffer size to be used
for reading and writing such files. **NOTE:** This configuration is useful to tune IO and
derive optimal performance. Based on the OS and underlying harddisk type, these values can
significantly affect the overall performance. It is ideal to tune the buffer size equivalent
to the IO buffer size of the OS. Recommended range is between 10240 and 10485760 bytes. |
 | carbon.sort.intermediate.files.limit | 20 | CarbonData sorts and writes data to intermediate
files to limit the memory usage. Before writing the target carbondata file, the records in
these intermediate files needs to be merged to reduce the number of intermediate files. This
configuration determines the minimum number of intermediate files after which merged sort
is applied on them sort the data. **NOTE:** Intermediate merging happens on a separate thread
in the background. Number of threads used is determined by ***carbon.merge.sort.reader.thread***.
Configuring a low value will cause more time to be spent in merging these intermediate merged
files which can cause more IO. Configuring a high value would cause not to use the idle threads
to do intermediate sort merges. Recommended range is between 2 and 50. |
 | carbon.merge.sort.reader.thread | 3 | CarbonData sorts and writes data to intermediate
files to limit the memory usage. When the intermediate files reaches ***carbon.sort.intermediate.files.limit***,
the files will be merged in another thread pool. This value will control the size of the pool.
Each thread will read the intermediate files and do merge sort and finally write the records
to another file. **NOTE:** Refer to ***carbon.sort.intermediate.files.limit*** for operation
description. Configuring smaller number of threads can cause merging slow down over loading
process whereas configuring larger number of threads can cause thread contention with threads
in other data loading steps. Hence configure a fraction of ***carbon.number.of.cores.while.loading***.
|
 | carbon.merge.sort.prefetch | true | CarbonData writes every ***carbon.sort.size*** number
of records to intermediate temp files during data loading to ensure memory footprint is within
limits. These intermediate temp files will have to be sorted using merge sort before writing
into CarbonData format. This configuration enables pre fetching of data from these temp files
in order to optimize IO and speed up data loading process. |

http://git-wip-us.apache.org/repos/asf/carbondata/blob/4705d1a2/docs/ddl-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md
index acdf8db..965f11c 100644
--- a/docs/ddl-of-carbondata.md
+++ b/docs/ddl-of-carbondata.md
@@ -450,7 +450,7 @@ CarbonData DDL statements are documented here,which includes:
    - ##### Compression for table
 
      Data compression is also supported by CarbonData.
-     By default, Snappy is used to compress the data. CarbonData also support ZSTD compressor.
+     By default, Snappy is used to compress the data. CarbonData also supports ZSTD compressor.
      User can specify the compressor in the table property:
 
      ```
@@ -557,7 +557,7 @@ CarbonData DDL statements are documented here,which includes:
 
 ### Create external table on Non-Transactional table data location.
   Non-Transactional table data location will have only carbondata and carbonindex files,
there will not be a metadata folder (table status and schema).
-  Our SDK module currently support writing data in this format.
+  Our SDK module currently supports writing data in this format.
 
   **Example:**
   ```

http://git-wip-us.apache.org/repos/asf/carbondata/blob/4705d1a2/docs/dml-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/dml-of-carbondata.md b/docs/dml-of-carbondata.md
index 393ebd3..65654a4 100644
--- a/docs/dml-of-carbondata.md
+++ b/docs/dml-of-carbondata.md
@@ -58,7 +58,7 @@ CarbonData DML statements are documented here,which includes:
 | [COLUMNDICT](#columndict)                               | Path to read the dictionary data
from for particular column  |
 | [DATEFORMAT](#dateformattimestampformat)                | Format of date in the input csv
file                         |
 | [TIMESTAMPFORMAT](#dateformattimestampformat)           | Format of timestamp in the input
csv file                    |
-| [SORT_COLUMN_BOUNDS](#sort-column-bounds)               | How to parititon the sort columns
to make the evenly distributed |
+| [SORT_COLUMN_BOUNDS](#sort-column-bounds)               | How to partition the sort columns
to make the evenly distributed |
 | [SINGLE_PASS](#single_pass)                             | When to enable single pass data
loading                      |
 | [BAD_RECORDS_LOGGER_ENABLE](#bad-records-handling)      | Whether to enable bad records
logging                        |
 | [BAD_RECORD_PATH](#bad-records-handling)                | Bad records logging path. Useful
when bad record logging is enabled |
@@ -83,7 +83,7 @@ CarbonData DML statements are documented here,which includes:
     ```
 
   - ##### COMMENTCHAR:
-    Comment Characters can be provided in the load command if user want to comment lines.
+    Comment Characters can be provided in the load command if user wants to comment lines.
     ```
     OPTIONS('COMMENTCHAR'='#')
     ```
@@ -184,7 +184,7 @@ CarbonData DML statements are documented here,which includes:
 
     **NOTE:**
     * SORT_COLUMN_BOUNDS will be used only when the SORT_SCOPE is 'local_sort'.
-    * Carbondata will use these bounds as ranges to process data concurrently during the
final sort percedure. The records will be sorted and written out inside each partition. Since
the partition is sorted, all records will be sorted.
+    * Carbondata will use these bounds as ranges to process data concurrently during the
final sort procedure. The records will be sorted and written out inside each partition. Since
the partition is sorted, all records will be sorted.
     * Since the actual order and literal order of the dictionary column are not necessarily
the same, we do not recommend you to use this feature if the first sort column is 'dictionary_include'.
     * The option works better if your CPU usage during loading is low. If your current system
CPU usage is high, better not to use this option. Besides, it depends on the user to specify
the bounds. If user does not know the exactly bounds to make the data distributed evenly among
the bounds, loading performance will still be better than before or at least the same as before.
     * Users can find more information about this option in the description of PR1953.

http://git-wip-us.apache.org/repos/asf/carbondata/blob/4705d1a2/docs/file-structure-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/file-structure-of-carbondata.md b/docs/file-structure-of-carbondata.md
index 9e656bb..9313593 100644
--- a/docs/file-structure-of-carbondata.md
+++ b/docs/file-structure-of-carbondata.md
@@ -122,8 +122,7 @@ Compared with V2: The blocklet data volume of V2 format defaults to 120,000
line
 
 #### Footer format
 
-Footer records each carbondata
-All blocklet data distribution information and statistical related metadata information (minmax,
startkey/endkey) inside the file.
+Footer records each carbondata, all blocklet data distribution information and statistical
related metadata information (minmax, startkey/endkey) inside the file.
 
 ![Footer format](../docs/images/2-3_4.png?raw=true)
 


Mime
View raw message