Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8B347200C09 for ; Wed, 11 Jan 2017 04:31:08 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 89EBF160B4B; Wed, 11 Jan 2017 03:31:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D1CBF160B3D for ; Wed, 11 Jan 2017 04:31:07 +0100 (CET) Received: (qmail 56519 invoked by uid 500); 11 Jan 2017 03:31:07 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 56506 invoked by uid 99); 11 Jan 2017 03:31:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jan 2017 03:31:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 726EA1806A8 for ; Wed, 11 Jan 2017 03:31:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.019 X-Spam-Level: X-Spam-Status: No, score=-7.019 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id SOceRZyQ5v81 for ; Wed, 11 Jan 2017 03:31:04 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 8643B5FBFE for ; Wed, 11 Jan 2017 03:31:03 +0000 (UTC) Received: (qmail 56379 invoked by uid 99); 11 Jan 2017 03:31:02 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jan 2017 03:31:02 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 9194ADFA98; Wed, 11 Jan 2017 03:31:02 +0000 (UTC) From: jackylk To: issues@carbondata.incubator.apache.org Reply-To: issues@carbondata.incubator.apache.org References: In-Reply-To: Subject: [GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re... Content-Type: text/plain Message-Id: <20170111033102.9194ADFA98@git1-us-west.apache.org> Date: Wed, 11 Jan 2017 03:31:02 +0000 (UTC) archived-at: Wed, 11 Jan 2017 03:31:08 -0000 Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/518#discussion_r95506643 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -301,4 +304,45 @@ object CommonUtil { LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }") } } + + def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = { + val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) { + CarbonCommonConstants.COMMA + } else { + CarbonUtil.delimiterConverter(carbonLoadModel.getCsvDelimiter) + } + var csvFile: String = null + var csvHeader: String = carbonLoadModel.getCsvHeader + val csvColumns = if (StringUtils.isBlank(csvHeader)) { + // read header from csv file + csvFile = carbonLoadModel.getFactFilePath.split(",")(0) + csvHeader = CarbonUtil.readHeader(csvFile) + if (StringUtils.isBlank(csvHeader)) { + throw new CarbonDataLoadingException("First line of the csv is not valid.") + } + csvHeader.toLowerCase().split(delimiter).map(_.replaceAll("\"", "").trim) + } else { + csvHeader.toLowerCase.split(CarbonCommonConstants.COMMA).map(_.trim) + } + + if (!CarbonDataProcessorUtil.isHeaderValid(carbonLoadModel.getTableName, csvColumns, + carbonLoadModel.getCarbonDataLoadSchema)) { + if (csvFile == null) { + LOGGER.error("CSV header provided in DDL is not proper." + + " Column names in schema and CSV header are not the same.") + throw new CarbonDataLoadingException( + "CSV header provided in DDL is not proper. Column names in schema and CSV header are " + + "not the same.") + } else { + LOGGER.error( + "CSV File provided is not proper. Column names in schema and csv header are not same. " --- End diff -- Better to tell "CSV header in the input file ($csvFile) is not proper." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---