crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Barry (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-564) Add support for using escape character same as open/close quote character
Date Tue, 29 Sep 2015 20:19:04 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935788#comment-14935788
] 

Nathan Barry commented on CRUNCH-564:
-------------------------------------

I'm guessing you are wanting to escape double quotes by using 2 double quotes in a row? 

such as:
{code}
"this line","has "" 2 double quotes in it"
"this line","has none"
"this line","has roast beef"
{code}

If so, I believe the CSVLineReader will handle that case automatically, not through setting
double quote as the escape character.  What happens if you set the escape character to backslash?
 Does the CSV file parse properly?

> Add support for using escape character same as open/close quote character
> -------------------------------------------------------------------------
>
>                 Key: CRUNCH-564
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-564
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Muhammad
>            Assignee: Josh Wills
>            Priority: Trivial
>              Labels: csv, csvparser
>
> As a user I would like to use CSVInputFormat to handle the CSV files following this RFC
http://www.ietf.org/rfc/rfc4180.txt.
> Many developers use Apache StringEscapeUtils.escapeCsv( ) method to escape their CSVs.
The method escapes the CSV following the RFC4180. 
> https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html
> The CSVLineReader throws exception in such a case. We can enhance the code to support
the CSVs that use escape same as the quote characters.
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVLineReader.java#L152
> I would appreciate a comment, if someone has knowingly rejected the idea due to some
technical limitation or a problem with allowing escape and quote as same characters. By the
way Apache HAWQ seem to get around this issue somehow and reads such CSVs alright.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message