Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 39E75200BF6 for ; Tue, 10 Jan 2017 13:29:24 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 38691160B3D; Tue, 10 Jan 2017 12:29:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0FD67160B31 for ; Tue, 10 Jan 2017 13:29:22 +0100 (CET) Received: (qmail 51679 invoked by uid 500); 10 Jan 2017 12:29:22 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 51669 invoked by uid 99); 10 Jan 2017 12:29:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2017 12:29:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id AD49BC028C for ; Tue, 10 Jan 2017 12:29:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.018 X-Spam-Level: X-Spam-Status: No, score=-7.018 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 4EO587J01aox for ; Tue, 10 Jan 2017 12:29:17 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id C86345FD3F for ; Tue, 10 Jan 2017 12:29:16 +0000 (UTC) Received: (qmail 51357 invoked by uid 99); 10 Jan 2017 12:28:47 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2017 12:28:47 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 1C3CDDFA98; Tue, 10 Jan 2017 12:28:47 +0000 (UTC) From: jackylk To: issues@carbondata.incubator.apache.org Reply-To: issues@carbondata.incubator.apache.org References: In-Reply-To: Subject: [GitHub] incubator-carbondata pull request #510: Document update for UID Content-Type: text/plain Message-Id: <20170110122847.1C3CDDFA98@git1-us-west.apache.org> Date: Tue, 10 Jan 2017 12:28:47 +0000 (UTC) archived-at: Tue, 10 Jan 2017 12:29:24 -0000 Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/510#discussion_r95354038 --- Diff: docs/DML-Operations-on-Carbon.md --- @@ -1,197 +1,308 @@ - - -* [LOAD DATA](#LOAD DATA) -* [SHOW SEGMENTS](#SHOW SEGMENTS) -* [DELETE SEGMENT BY ID](#DELETE SEGMENT BY ID) -* [DELETE SEGMENT BY DATE](#DELETE SEGMENT BY DATE) - -*** - -# LOAD DATA - This command loads the user data in raw format to the Carbon specific data format store, this way Carbon provides good performance while querying the data.Please visit [Data Management](Carbondata-Management.md) for more details on LOAD - -### Syntax - - ```ruby - LOAD DATA [LOCAL] INPATH 'folder_path' INTO TABLE [db_name.]table_name - OPTIONS(property_name=property_value, ...) - ``` - -### Parameter Description - -| Parameter | Description | Optional | -| ------------- | -----| -------- | -| folder_path | Path of raw csv data folder or file. | NO | -| db_name | Database name, if it is not specified then it uses current database. | YES | -| table_name | The name of the table in provided database.| NO | -| OPTIONS | Extra options provided to Load | YES | - - -### Usage Guideline -Following are the options that can be used in load data: -- **DELIMITER:** Delimiters can be provided in the load command. - - ``` ruby - OPTIONS('DELIMITER'=',') - ``` -- **QUOTECHAR:** Quote Characters can be provided in the load command. - - ```ruby - OPTIONS('QUOTECHAR'='"') - ``` -- **COMMENTCHAR:** Comment Characters can be provided in the load command if user want to comment lines. - - ```ruby - OPTIONS('COMMENTCHAR'='#') - ``` -- **FILEHEADER:** Headers can be provided in the LOAD DATA command if headers are missing in the source files. - - ```ruby - OPTIONS('FILEHEADER'='column1,column2') - ``` -- **MULTILINE:** CSV with new line character in quotes. - - ```ruby - OPTIONS('MULTILINE'='true') - ``` -- **ESCAPECHAR:** Escape char can be provided if user want strict validation of escape character on CSV. - - ```ruby - OPTIONS('ESCAPECHAR'='\') - ``` -- **COMPLEX_DELIMITER_LEVEL_1:** Split the complex type data column in a row (eg., a$b$c --> Array = {a,b,c}). - - ```ruby - OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$') - ``` -- **COMPLEX_DELIMITER_LEVEL_2:** Split the complex type nested data column in a row. Applies level_1 delimiter & applies level_2 based on complex data type (eg., a:b$c:d --> Array> = {{a,b},{c,d}}). - - ```ruby - OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':') - ``` -- **ALL_DICTIONARY_PATH:** All dictionary files path. - - ```ruby - OPTIONS('ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary') - ``` -- **COLUMNDICT:** Dictionary file path for specified column. - - ```ruby - OPTIONS('COLUMNDICT'='column1:dictionaryFilePath1, column2:dictionaryFilePath2') - ``` - Note: ALL_DICTIONARY_PATH and COLUMNDICT can't be used together. -- **DATEFORMAT:** Date format for specified column. - - ```ruby - OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2') - ``` - Note: Date formats are specified by date pattern strings. The date pattern letters in Carbon are - the same as in JAVA [SimpleDateFormat](http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html). - -**Example:** - - ```ruby - LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable - options('DELIMITER'=',', 'QUOTECHAR'='"', 'COMMENTCHAR'='#', - 'FILEHEADER'='empno,empname, - designation,doj,workgroupcategory, - workgroupcategoryname,deptno,deptname,projectcode, - projectjoindate,projectenddate,attendance,utilization,salary', - 'MULTILINE'='true', 'ESCAPECHAR'='\', - 'COMPLEX_DELIMITER_LEVEL_1'='$', - 'COMPLEX_DELIMITER_LEVEL_2'=':', - 'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary', - 'DATEFORMAT'='projectjoindate:yyyy-MM-dd' - ) - ``` - -*** - -# SHOW SEGMENTS -This command is to show the segments of carbon table to the user. - - ```ruby - SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments; - ``` - -### Parameter Description - -| Parameter | Description | Optional | -| ------------- | -----| --------- | -| db_name | Database name, if it is not specified then it uses current database. | YES | -| table_name | The name of the table in provided database.| NO | -| number_of_segments | limit the output to this number. | YES | - -**Example:** - - ```ruby - SHOW SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 2; - ``` - -*** - -# DELETE SEGMENT BY ID - -This command is to delete segment by using the segment ID. - - ```ruby - DELETE SEGMENT segment_id1,segment_id2 FROM TABLE [db_name.]table_name; - ``` - -### Parameter Description - -| Parameter | Description | Optional | -| ------------- | -----| --------- | -| segment_id | Segment Id of the load. | NO | -| db_name | Database name, if it is not specified then it uses current database. | YES | -| table_name | The name of the table in provided database.| NO | - -**Example:** - - ```ruby - DELETE SEGMENT 0 FROM TABLE CarbonDatabase.CarbonTable; - DELETE SEGMENT 0.1,5,8 FROM TABLE CarbonDatabase.CarbonTable; - ``` - Note: Here 0.1 is compacted segment sequence id. - -*** - -# DELETE SEGMENT BY DATE -This command will allow to deletes the Carbon segment(s) from the store based on the date provided by the user in the DML command. The segment created before the particular date will be removed from the specific stores. - - ```ruby - DELETE SEGMENTS FROM TABLE [db_name.]table_name WHERE STARTTIME BEFORE [DATE_VALUE]; - ``` - -### Parameter Description - -| Parameter | Description | Optional | -| ------------- | -----| ------ | -| DATE_VALUE | Valid segement load start time value. All the segments before this specified date will be deleted. | NO | -| db_name | Database name, if it is not specified then it uses current database. | YES | -| table_name | The name of the table in provided database.| NO | - -**Example:** - - ```ruby - DELETE SEGMENTS FROM TABLE CarbonDatabase.CarbonTable WHERE STARTTIME BEFORE '2017-06-01 12:05:06'; - ``` - -*** \ No newline at end of file + + +* [LOAD DATA](#LOAD DATA) +* [SHOW SEGMENTS](#SHOW SEGMENTS) +* [DELETE SEGMENT BY ID](#DELETE SEGMENT BY ID) +* [DELETE SEGMENT BY DATE](#DELETE SEGMENT BY DATE) +* [UPDATE CARBON TABLE](#UPDATE CARBON TABLE) +* [DELETE RECORDS from CARBON TABLE](#DELETE RECORDS from CARBON TABLE) + +*** + +# LOAD DATA + This command loads the user data in raw format to the Carbon specific data format store, this way Carbon provides good performance while querying the data.Please visit [Data Management](Carbondata-Management.md) for more details on LOAD + +### Syntax + + ```ruby + LOAD DATA [LOCAL] INPATH 'folder_path' INTO TABLE [db_name.]table_name + OPTIONS(property_name=property_value, ...) + ``` + +### Parameter Description + +| Parameter | Description | Optional | +| ------------- | -----| -------- | +| folder_path | Path of raw csv data folder or file. | NO | +| db_name | Database name, if it is not specified then it uses current database. | YES | +| table_name | The name of the table in provided database.| NO | +| OPTIONS | Extra options provided to Load | YES | + + +### Usage Guideline +Following are the options that can be used in load data: +- **DELIMITER:** Delimiters can be provided in the load command. + + ``` ruby + OPTIONS('DELIMITER'=',') + ``` +- **QUOTECHAR:** Quote Characters can be provided in the load command. + + ```ruby + OPTIONS('QUOTECHAR'='"') + ``` +- **COMMENTCHAR:** Comment Characters can be provided in the load command if user want to comment lines. + + ```ruby + OPTIONS('COMMENTCHAR'='#') + ``` +- **FILEHEADER:** Headers can be provided in the LOAD DATA command if headers are missing in the source files. + + ```ruby + OPTIONS('FILEHEADER'='column1,column2') + ``` +- **MULTILINE:** CSV with new line character in quotes. + + ```ruby + OPTIONS('MULTILINE'='true') + ``` +- **ESCAPECHAR:** Escape char can be provided if user want strict validation of escape character on CSV. + + ```ruby + OPTIONS('ESCAPECHAR'='\') + ``` +- **COMPLEX_DELIMITER_LEVEL_1:** Split the complex type data column in a row (eg., a$b$c --> Array = {a,b,c}). + + ```ruby + OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$') + ``` +- **COMPLEX_DELIMITER_LEVEL_2:** Split the complex type nested data column in a row. Applies level_1 delimiter & applies level_2 based on complex data type (eg., a:b$c:d --> Array> = {{a,b},{c,d}}). + + ```ruby + OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':') + ``` +- **ALL_DICTIONARY_PATH:** All dictionary files path. + + ```ruby + OPTIONS('ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary') + ``` +- **COLUMNDICT:** Dictionary file path for specified column. + + ```ruby + OPTIONS('COLUMNDICT'='column1:dictionaryFilePath1, column2:dictionaryFilePath2') + ``` + Note: ALL_DICTIONARY_PATH and COLUMNDICT can't be used together. +- **DATEFORMAT:** Date format for specified column. + + ```ruby + OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2') + ``` + Note: Date formats are specified by date pattern strings. The date pattern letters in Carbon are + the same as in JAVA [SimpleDateFormat](http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html). + +**Example:** + + ```ruby + LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable + options('DELIMITER'=',', 'QUOTECHAR'='"', 'COMMENTCHAR'='#', + 'FILEHEADER'='empno,empname, + designation,doj,workgroupcategory, + workgroupcategoryname,deptno,deptname,projectcode, + projectjoindate,projectenddate,attendance,utilization,salary', + 'MULTILINE'='true', 'ESCAPECHAR'='\', + 'COMPLEX_DELIMITER_LEVEL_1'='$', + 'COMPLEX_DELIMITER_LEVEL_2'=':', + 'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary', + 'DATEFORMAT'='projectjoindate:yyyy-MM-dd' + ) + ``` + +*** + +# SHOW SEGMENTS +This command is to show the segments of carbon table to the user. + + ```ruby + SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments; + ``` + +### Parameter Description + +| Parameter | Description | Optional | +| ------------- | -----| --------- | +| db_name | Database name, if it is not specified then it uses current database. | YES | +| table_name | The name of the table in provided database.| NO | +| number_of_segments | limit the output to this number. | YES | + +**Example:** + + ```ruby + SHOW SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 2; + ``` + +*** + +# DELETE SEGMENT BY ID + +This command is to delete segment by using the segment ID. + + ```ruby + DELETE SEGMENT segment_id1,segment_id2 FROM TABLE [db_name.]table_name; + ``` + +### Parameter Description + +| Parameter | Description | Optional | +| ------------- | -----| --------- | +| segment_id | Segment Id of the load. | NO | +| db_name | Database name, if it is not specified then it uses current database. | YES | +| table_name | The name of the table in provided database.| NO | + +**Example:** + + ```ruby + DELETE SEGMENT 0 FROM TABLE CarbonDatabase.CarbonTable; + DELETE SEGMENT 0.1,5,8 FROM TABLE CarbonDatabase.CarbonTable; + ``` + Note: Here 0.1 is compacted segment sequence id. + +*** + +# DELETE SEGMENT BY DATE +This command will allow to deletes the Carbon segment(s) from the store based on the date provided by the user in the DML command. The segment created before the particular date will be removed from the specific stores. + + ```ruby + DELETE SEGMENTS FROM TABLE [db_name.]table_name WHERE STARTTIME BEFORE [DATE_VALUE]; + ``` + +### Parameter Description + +| Parameter | Description | Optional | +| ------------- | -----| ------ | +| DATE_VALUE | Valid segement load start time value. All the segments before this specified date will be deleted. | NO | +| db_name | Database name, if it is not specified then it uses current database. | YES | +| table_name | The name of the table in provided database.| NO | + +**Example:** + + ```ruby + DELETE SEGMENTS FROM TABLE CarbonDatabase.CarbonTable WHERE STARTTIME BEFORE '2017-06-01 12:05:06'; + ``` + +*** + +# UPDATE CARBON TABLE +This command updates the carbon table based on the column expression and optional filter conditions. + +The client node where the UPDATE command is executed should be part of the cluster. +### Syntax +Syntax1: + ```ruby + UPDATE + SET (column_name1, column_name2, ... column_name n) = (column1_expression , column2_expression , column3_expression . .. column n_expression ) + [ WHERE { } ]; + ``` +Syntax2: + ```ruby + UPDATE + SET (column_name1, column_name2,) = (select sourceColumn1, sourceColumn2 from sourceTable [ WHERE { } ] ) + [ WHERE { } ]; + ``` + +### Parameter Description + +| Parameter | Description | +| ------------- | -----| +| CARBON TABLE | The name of the Carbon table in which you want to perform the update operation. | +| column_name | The destination columns to be updated. | +| sourceColumn | The source table column values to be updated in destination table. | +| sourceTable | The table from which the records are updated into destination Carbon table. | + +### Usage Guidelines +Following are the conditions to use UPDATE: --- End diff -- Here, please organize the description for syntax 1 and syntax 2 separately. For each section, basically we want to describe in what case the update will fail, and provide a brief reason that why it should failed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---