Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 180D6200C5E for ; Fri, 7 Apr 2017 11:55:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 16CEA160BAA; Fri, 7 Apr 2017 09:55:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E2671160BA5 for ; Fri, 7 Apr 2017 11:55:09 +0200 (CEST) Received: (qmail 25652 invoked by uid 500); 7 Apr 2017 09:55:09 -0000 Mailing-List: contact commits-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list commits@carbondata.incubator.apache.org Received: (qmail 25591 invoked by uid 99); 7 Apr 2017 09:55:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Apr 2017 09:55:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B7A3D1A0E1D for ; Fri, 7 Apr 2017 09:55:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.222 X-Spam-Level: X-Spam-Status: No, score=-4.222 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id TAseUaiAOLpi for ; Fri, 7 Apr 2017 09:55:07 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id 8E85F61A5E for ; Fri, 7 Apr 2017 09:55:05 +0000 (UTC) Received: (qmail 24835 invoked by uid 99); 7 Apr 2017 09:55:04 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Apr 2017 09:55:04 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 71DE3E967F; Fri, 7 Apr 2017 09:55:04 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: jackylk@apache.org To: commits@carbondata.incubator.apache.org Date: Fri, 07 Apr 2017 09:55:24 -0000 Message-Id: <83b18cd7c1cf4b3eb2d05294897494ff@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [21/49] incubator-carbondata git commit: [CARBONDATA-400] * Problem: When the number of characters in a column exceeds 100000 characters whole string appears in beeline with exception. archived-at: Fri, 07 Apr 2017 09:55:11 -0000 [CARBONDATA-400] * Problem: When the number of characters in a column exceeds 100000 characters whole string appears in beeline with exception. Analysis: In univocity csv parser settings , the maximum number of characters per column is 100000 and when it exceeds that limit, TextparsingException is thrown with the complete string as error in beeline during data load. Fix: Now a proper error message is displayed in beeline and complete error messages and parser settings details will be present in logs. Impact area: Data loading with more than 100000 characters in a single column. Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/c99cf06d Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/c99cf06d Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/c99cf06d Branch: refs/heads/12-dev Commit: c99cf06de469ac97c3d7269377394d76190554e7 Parents: fa7421a Author: Akash R Nilugal Authored: Mon Dec 5 15:16:10 2016 +0530 Committer: ravipesala Committed: Thu Apr 6 15:33:53 2017 +0530 ---------------------------------------------------------------------- .../spark/rdd/CarbonGlobalDictionaryRDD.scala | 3 +++ .../spark/util/GlobalDictionaryUtil.scala | 25 +++++++++++++++++--- .../processing/csvload/CSVInputFormat.java | 1 + 3 files changed, 26 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/c99cf06d/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala ---------------------------------------------------------------------- diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala index 74531d3..ea71ea1 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala @@ -27,6 +27,7 @@ import scala.collection.mutable.ArrayBuffer import scala.util.control.Breaks.{break, breakable} import au.com.bytecode.opencsv.CSVReader +import com.univocity.parsers.common.TextParsingException import org.apache.commons.lang3.{ArrayUtils, StringUtils} import org.apache.spark._ import org.apache.spark.rdd.RDD @@ -307,6 +308,8 @@ class CarbonBlockDistinctValuesCombineRDD( } CarbonTimeStatisticsFactory.getLoadStatisticsInstance.recordLoadCsvfilesToDfTime() } catch { + case txe: TextParsingException => + throw txe case ex: Exception => LOGGER.error(ex) throw ex http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/c99cf06d/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala ---------------------------------------------------------------------- diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala index 5cb493c..aeb387a 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala @@ -32,7 +32,7 @@ import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.Path import org.apache.hadoop.io.NullWritable import org.apache.hadoop.mapreduce.lib.input.FileInputFormat -import org.apache.spark.Accumulator +import org.apache.spark.{Accumulator, SparkException} import org.apache.spark.rdd.{NewHadoopRDD, RDD} import org.apache.spark.sql._ import org.apache.spark.sql.types.{StringType, StructField, StructType} @@ -784,9 +784,28 @@ object GlobalDictionaryUtil { } } catch { case ex: Exception => - LOGGER.error(ex, "generate global dictionary failed") - throw ex + ex match { + case spx: SparkException => + LOGGER.error(spx, "generate global dictionary failed") + throw new Exception("generate global dictionary failed, " + + trimErrorMessage(spx.getMessage)) + case _ => + LOGGER.error(ex, "generate global dictionary failed") + throw ex + } + } + } + + // Get proper error message of TextParsingException + def trimErrorMessage(input: String): String = { + var errorMessage: String = null + if (input != null) { + if (input.split("Hint").length > 0 && + input.split("Hint")(0).split("TextParsingException: ").length > 1) { + errorMessage = input.split("Hint")(0).split("TextParsingException: ")(1) + } } + errorMessage } /** http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/c99cf06d/processing/src/main/java/org/apache/carbondata/processing/csvload/CSVInputFormat.java ---------------------------------------------------------------------- diff --git a/processing/src/main/java/org/apache/carbondata/processing/csvload/CSVInputFormat.java b/processing/src/main/java/org/apache/carbondata/processing/csvload/CSVInputFormat.java index 7545fe5..1f7d403 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/csvload/CSVInputFormat.java +++ b/processing/src/main/java/org/apache/carbondata/processing/csvload/CSVInputFormat.java @@ -219,6 +219,7 @@ public class CSVInputFormat extends FileInputFormat