crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Muhammad (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-564) Add support for using escape character same as open/close quote character
Date Tue, 29 Sep 2015 21:43:06 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935936#comment-14935936
] 

Muhammad commented on CRUNCH-564:
---------------------------------

It appears to work with \ as escape character. Ill update if I face issues.

On configuration options - I thought you mandated to provide everything, because if I do not
provide CSV_BUFFER_SIZE it crashes with NPE, following is the code snippet that fails. 

{code}
  final String bufferValue = this.configuration.get(CSVFileSource.CSV_BUFFER_SIZE);
    if ("".equals(bufferValue)) {
      bufferSize = CSVLineReader.DEFAULT_BUFFER_SIZE;
    } else {
      bufferSize = Integer.parseInt(bufferValue);
    }
{code}

And If I do not provide CSV_INPUT_FILE_ENCODING it crashes also both because 
{code}  this.configuration.get(CSVFileSource.CSV_INPUT_FILE_ENCODING/CSV_BUFFER_SIZE)
{code} 
is returning a *null* and not empty string making it go in the *else* clause..

I'm using {code}org.apache.mrunit:mrunit:1.1.0:hadoop2{code} and 
{code}org.apache.hadoop:hadoop-mapreduce-client-jobclient:2.6.0{code}

> Add support for using escape character same as open/close quote character
> -------------------------------------------------------------------------
>
>                 Key: CRUNCH-564
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-564
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Muhammad
>            Assignee: Josh Wills
>            Priority: Trivial
>              Labels: csv, csvparser
>
> As a user I would like to use CSVInputFormat to handle the CSV files following this RFC
http://www.ietf.org/rfc/rfc4180.txt.
> Many developers use Apache StringEscapeUtils.escapeCsv( ) method to escape their CSVs.
The method escapes the CSV following the RFC4180. 
> https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html
> The CSVLineReader throws exception in such a case. We can enhance the code to support
the CSVs that use escape same as the quote characters.
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVLineReader.java#L152
> I would appreciate a comment, if someone has knowingly rejected the idea due to some
technical limitation or a problem with allowing escape and quote as same characters. By the
way Apache HAWQ seem to get around this issue somehow and reads such CSVs alright.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message