Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F3F3718D6A for ; Tue, 29 Sep 2015 21:15:06 +0000 (UTC) Received: (qmail 42973 invoked by uid 500); 29 Sep 2015 21:15:06 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 42938 invoked by uid 500); 29 Sep 2015 21:15:06 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 42925 invoked by uid 500); 29 Sep 2015 21:15:06 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 42921 invoked by uid 99); 29 Sep 2015 21:15:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Sep 2015 21:15:06 +0000 Date: Tue, 29 Sep 2015 21:15:06 +0000 (UTC) From: "mac champion (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-564) Add support for using escape character same as open/close quote character MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935877#comment-14935877 ] mac champion commented on CRUNCH-564: ------------------------------------- +1, Thanks Nathan. I think some of the confusion around the CSV format stems from all of the (unnecessary?) configuration options it has. I put them there for flexibility, but maybe they're just distracting. Now that I think back, I can't really imagine a case where someone would want to specify a custom escape character or open and close quote character. > Add support for using escape character same as open/close quote character > ------------------------------------------------------------------------- > > Key: CRUNCH-564 > URL: https://issues.apache.org/jira/browse/CRUNCH-564 > Project: Crunch > Issue Type: Improvement > Components: Core > Reporter: Muhammad > Assignee: Josh Wills > Priority: Trivial > Labels: csv, csvparser > > As a user I would like to use CSVInputFormat to handle the CSV files following this RFC http://www.ietf.org/rfc/rfc4180.txt. > Many developers use Apache StringEscapeUtils.escapeCsv( ) method to escape their CSVs. The method escapes the CSV following the RFC4180. > https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html > The CSVLineReader throws exception in such a case. We can enhance the code to support the CSVs that use escape same as the quote characters. > https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVLineReader.java#L152 > I would appreciate a comment, if someone has knowingly rejected the idea due to some technical limitation or a problem with allowing escape and quote as same characters. By the way Apache HAWQ seem to get around this issue somehow and reads such CSVs alright. -- This message was sent by Atlassian JIRA (v6.3.4#6332)