carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jack...@apache.org
Subject carbondata git commit: [CARBONDATA-1252]Updated load section of configuration-parameters.md for BAD_RECORD_PATH
Date Wed, 02 Aug 2017 16:19:48 GMT
Repository: carbondata
Updated Branches:
  refs/heads/master 414ea7730 -> d327cb2bd


[CARBONDATA-1252]Updated load section of configuration-parameters.md for BAD_RECORD_PATH

This closes #1207


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/d327cb2b
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/d327cb2b
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/d327cb2b

Branch: refs/heads/master
Commit: d327cb2bd56dd04cecc53988c8f88c4fd9cbe334
Parents: 414ea77
Author: vandana <vandana.yadav759@gmail.com>
Authored: Fri Jul 28 15:43:26 2017 +0530
Committer: Jacky Li <jacky.likun@qq.com>
Committed: Thu Aug 3 00:19:31 2017 +0800

----------------------------------------------------------------------
 docs/configuration-parameters.md    |  5 +++-
 docs/dml-operation-on-carbondata.md | 39 +++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/d327cb2b/docs/configuration-parameters.md
----------------------------------------------------------------------
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index c85a522..133b75b 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -58,7 +58,10 @@ This section provides the details of all the configurations required for
CarbonD
 | carbon.merge.sort.prefetch | true | Enable prefetch of data during merge sort while reading
data from sort temp files in data loading. |  |
 | carbon.update.persist.enable | true | Enabling this parameter considers persistent data.
Enabling this will reduce the execution time of UPDATE operation. |  |
 | carbon.load.global.sort.partitions | 0 | The Number of partitions to use when shuffling
data for sort. If user don't configurate or configurate it less than 1, it uses the number
of map tasks as reduce tasks. In general, we recommend 2-3 tasks per CPU core in your cluster.
-
+| carbon.options.bad.records.logger.enable | false | Whether to create logs with details
about bad records. | |
+| carbon.bad.records.action | fail | This property can have four types of actions for bad
records FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by
storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw
CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written
to the raw CSV. If set to FAIL then data loading fails if any bad records are found. | |
+| carbon.options.is.empty.data.bad.record | false | If false, then empty ("" or '' or ,,)
data will not be considered as bad record and vice versa. | |
+| carbon.options.bad.record.path |  | Specifies the HDFS path where bad records are stored.
By default the value is Null. This path must to be configured by the user if bad record logger
is enabled or bad record action redirect. | |
 
 
 * **Compaction Configuration**

http://git-wip-us.apache.org/repos/asf/carbondata/blob/d327cb2b/docs/dml-operation-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/dml-operation-on-carbondata.md b/docs/dml-operation-on-carbondata.md
index e205972..c4c3465 100644
--- a/docs/dml-operation-on-carbondata.md
+++ b/docs/dml-operation-on-carbondata.md
@@ -149,7 +149,7 @@ You can use the following options to load data:
    
    * If this option is set to TRUE, then high.cardinality.identify.enable property will be
disabled during data load.
    
-### Example:
+  ### Example:
 
 ```
 LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable
@@ -164,6 +164,43 @@ options('DELIMITER'=',', 'QUOTECHAR'='"','COMMENTCHAR'='#',
 )
 ```
 
+- **BAD RECORDS HANDLING:** Methods of handling bad records are as follows:
+
+    * Load all of the data before dealing with the errors.
+
+    * Clean or delete bad records before loading data or stop the loading when bad records
are found.
+
+    ```
+    OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon',
'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false')
+    ```
+
+    NOTE:
+
+    * If the REDIRECT option is used, Carbon will add all bad records in to a separate CSV
file. However, this file must not be used for subsequent data loading because the content
may not exactly match the source record. You are advised to cleanse the original source record
for further data ingestion. This option is used to remind you which records are bad records.
+
+    * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and
the load operation fails.
+
+    * The maximum number of characters per column is 100000. If there are more than 100000
characters in a column, data loading will fail.
+
+### Example:
+
+```
+LOAD DATA INPATH 'filepath.csv'
+INTO TABLE tablename
+OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true',
+'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon',
+'BAD_RECORDS_ACTION'='REDIRECT',
+'IS_EMPTY_DATA_BAD_RECORD'='false');
+```
+
+ **Bad Records Management Options:**
+
+ | Options                   | Default Value | Description                              
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                       |
+ |---------------------------|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+ | BAD_RECORDS_LOGGER_ENABLE | false         | Whether to create logs with details about
bad records.                                                                             
                                                                                         
                                                                                         
                                                                                         
                                       |
+ | BAD_RECORDS_ACTION        | FAIL          | Following are the four types of action for
bad records:  FORCE: Auto-corrects the data by storing the bad records as NULL.  REDIRECT:
Bad records are written to the raw CSV instead of being loaded.  IGNORE: Bad records are neither
loaded nor written to the raw CSV.  FAIL: Data loading fails if any bad records are found.
 NOTE: In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and
the load operation fails. |
+ | IS_EMPTY_DATA_BAD_RECORD  | false         | If false, then empty ("" or '' or ,,) data
will not be considered as bad record and vice versa.                                     
                                                                                         
                                                                                         
                                                                                         
                                      |
+ | BAD_RECORD_PATH           | -             | Specifies the HDFS path where bad records
are stored. By default the value is Null. This path must to be configured by the user if bad
record logger is enabled or bad record action redirect.                                  
                                                                                         
                                                                                         
                                    |
 
 ## INSERT DATA INTO A CARBONDATA TABLE
 


Mime
View raw message