Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 47401200BB7 for ; Wed, 9 Nov 2016 16:50:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 45F29160AFA; Wed, 9 Nov 2016 15:50:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 67860160AEB for ; Wed, 9 Nov 2016 16:50:03 +0100 (CET) Received: (qmail 97736 invoked by uid 500); 9 Nov 2016 15:50:02 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 97727 invoked by uid 99); 9 Nov 2016 15:50:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2016 15:50:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 34A3318396C for ; Wed, 9 Nov 2016 15:50:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.219 X-Spam-Level: X-Spam-Status: No, score=-6.219 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id G_H_ChpkRAER for ; Wed, 9 Nov 2016 15:50:00 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 60A115F1B8 for ; Wed, 9 Nov 2016 15:49:59 +0000 (UTC) Received: (qmail 97592 invoked by uid 99); 9 Nov 2016 15:49:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2016 15:49:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 822802C1F5A for ; Wed, 9 Nov 2016 15:49:58 +0000 (UTC) Date: Wed, 9 Nov 2016 15:49:58 +0000 (UTC) From: "MAKAMRAGHUVARDHAN (JIRA)" To: issues@carbondata.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CARBONDATA-400) [Bad Records] Load data is fail and displaying the string value in beeline as exception MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 09 Nov 2016 15:50:04 -0000 [ https://issues.apache.org/jira/browse/CARBONDATA-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MAKAMRAGHUVARDHAN updated CARBONDATA-400: ----------------------------------------- Description: Steps 1. Create table CREATE TABLE String_test2 (string_col string) STORED BY 'org.apache.carbondata.format'; 2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a string value that is out of boundary. LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table String_test2 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col'); Actual Result: Load data is failed and displaying the string value in beeline as exception trace. Expected Result:Should display a correct error message and should not print the exception trace on the console. Exception thrown on console is as shown below. Error: com.univocity.parsers.common.TextParsingException: Error processing input: Length of parsed input (100001) exceeds the maximum number of characters defined in your parser settings (100000). Hint: Number of characters processed may have exceeded limit of 100000 characters per column. Use settings.setMaxCharsPerColumn(int) to define the maximum number of characters a column can have Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse Parser Configuration: CsvParserSettings: Column reordering enabled=true Empty value=null Header extraction enabled=false Headers=null Ignore leading whitespaces=true Ignore trailing whitespaces=true Input buffer size=128 Input reading on separate thread=false Line separator detection enabled=false Maximum number of characters per column=100000 Maximum number of columns=20480 Null value= Number of records to read=all Parse unescaped quotes=true Row processor=none Selected fields=none Skip empty lines=trueFormat configuration: CsvFormat: Comment character=# Field delimiter=, Line separator (normalized)=\n Line separator sequence=\n Quote character=" Quote escape character=quote escape Quote escape escape character=\0, line=0, char=100002. Content parsed: [hellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuudududududududududududududududududududududududududududuhellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuududududududududuu was: Steps 1. Create table CREATE TABLE String_test2 (string_col string) STORED BY 'org.apache.carbondata.format'; 2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a string value that is out of boundary. LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table String_test2 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col'); Actual Result: Load data is failed and displaying the string value in beeline as exception trace. Expected Result:Should display a valid exception. > [Bad Records] Load data is fail and displaying the string value in beeline as exception > --------------------------------------------------------------------------------------- > > Key: CARBONDATA-400 > URL: https://issues.apache.org/jira/browse/CARBONDATA-400 > Project: CarbonData > Issue Type: Bug > Components: data-load > Affects Versions: 0.1.0-incubating > Environment: 3node cluster > Reporter: MAKAMRAGHUVARDHAN > Priority: Minor > > Steps > 1. Create table > CREATE TABLE String_test2 (string_col string) STORED BY 'org.apache.carbondata.format'; > 2. Load the data with parameter 'BAD_RECORDS_ACTION'='FORCE' and csv contains a string value that is out of boundary. > LOAD DATA INPATH 'hdfs://hacluster/Carbon/Priyal/string5.csv' into table String_test2 OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='string_col'); > Actual Result: Load data is failed and displaying the string value in beeline as exception trace. > Expected Result:Should display a correct error message and should not print the exception trace on the console. > Exception thrown on console is as shown below. > Error: com.univocity.parsers.common.TextParsingException: Error processing input: Length of parsed input (100001) exceeds the maximum number of characters defined in your parser settings (100000). > Hint: Number of characters processed may have exceeded limit of 100000 characters per column. Use settings.setMaxCharsPerColumn(int) to define the maximum number of characters a column can have > Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse > Parser Configuration: CsvParserSettings: > Column reordering enabled=true > Empty value=null > Header extraction enabled=false > Headers=null > Ignore leading whitespaces=true > Ignore trailing whitespaces=true > Input buffer size=128 > Input reading on separate thread=false > Line separator detection enabled=false > Maximum number of characters per column=100000 > Maximum number of columns=20480 > Null value= > Number of records to read=all > Parse unescaped quotes=true > Row processor=none > Selected fields=none > Skip empty lines=trueFormat configuration: > CsvFormat: > Comment character=# > Field delimiter=, > Line separator (normalized)=\n > Line separator sequence=\n > Quote character=" > Quote escape character=quote escape > Quote escape escape character=\0, line=0, char=100002. Content parsed: [hellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuudududududududududududududududududududududududududududuhellohowareyouwelcomehellohellohellohellohellohellohellohelloheellooabcdefghijklmnopqrstuvwxyzabcqwertuyioplkjhgfdsazxcvbnmpoiuytrewqasdfghjklmnbvcxzasdghskhdgkhdbkshkjchskdhfssudkdjdudusdjhdshdshsjddshjdkdhgdhdshdhdududushdudududududududududududududududududuudududududududuududududududududuu -- This message was sent by Atlassian JIRA (v6.3.4#6332)