crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Muhammad (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-564) Add support for using escape character same as open/close quote character
Date Tue, 29 Sep 2015 14:32:04 GMT
Muhammad created CRUNCH-564:
-------------------------------

             Summary: Add support for using escape character same as open/close quote character
                 Key: CRUNCH-564
                 URL: https://issues.apache.org/jira/browse/CRUNCH-564
             Project: Crunch
          Issue Type: Improvement
          Components: Core
            Reporter: Muhammad
            Assignee: Josh Wills
            Priority: Trivial


As a user I would like to use CSVInputFormat to handle the CSV files following this RFC http://www.ietf.org/rfc/rfc4180.txt.

Many developers use Apache StringEscapeUtils.escapeCsv( ) method to escape their CSVs. The
method escapes the CSV following the RFC4180. 

https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html

The CSVLineReader throws exception in such a case. We can enhance the code to support the
CSVs that use escape same as the quote characters.

https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVLineReader.java#L152

I would appreciate a comment, if someone has knowingly rejected the idea due to some technical
limitation or a problem with allowing escape and quote as same characters. By the way Apache
HAWQ seem to get around this issue somehow and reads such CSVs alright.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message