flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4785) Flink string parser doesn't handle string fields containing two consecutive double quotes
Date Fri, 14 Oct 2016 15:57:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575710#comment-15575710
] 

Fabian Hueske commented on FLINK-4785:
--------------------------------------

The {{StringParser}} can parse enclosed by a quote character (usually a double quote). This
is required to parse strings that contain a field delimiter character. Otherwise we could
not parse a line like this:

{quote}
12,2.45,"I am a string with a field delimiter, right?",12
{quote}

The problem is that the {{StringParser}} does not support to escape the quote character. In
CSV files where a single double quote is used as quote character, this is usually done by
double double quotes like this:

{quote}
12,2.45,"Bill said to Bob: ""Hi!"".",12
{quote}

When we rewrote the {{StringParser}} a while back we decided to not support double double
quotes because there were no users requesting support for it and to simplify the parser logic
and keep the configuration options of the CsvInputFormat concise (which are already quite
a few).

> Flink string parser doesn't handle string fields containing two consecutive double quotes
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-4785
>                 URL: https://issues.apache.org/jira/browse/FLINK-4785
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.2
>            Reporter: Flavio Pompermaier
>              Labels: csv
>
> To reproduce the error run https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/Csv2RowExample.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message