flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Michels <...@data-artisans.com>
Subject Re: Quotes in fields of CsvInputFormat
Date Tue, 09 Dec 2014 11:17:29 GMT
Hi Malte,

Typically, double quotes are used to identify strings and thus are not
interpreted literally. Any data in a field after a double quoted string is
regarded as invalid trailing data.

You could replace double quotes with single quotes:

A|ggg
B|'hhh' xx
C|xxx

This results in the expected >'hhh' xx< for the second line.

Best regards,
Max

On Fri, Dec 5, 2014 at 4:44 PM, Malte Schwarzer <ms@mieo.de> wrote:

> Hi Stephan,
>
> The result should be >"hhh“ xx<  as field value. Enclosures should be
> disabled but there seems to be no method to do that.
>
>
> Malte
>
> Von: Stephan Ewen <sewen@apache.org>
> Antworten an: <user@flink.incubator.apache.org>
> Datum: Freitag, 5. Dezember 2014 16:28
> An: <user@flink.incubator.apache.org>
> Betreff: Re: Quotes in fields of CsvInputFormat
>
> Hi!
>
> The parser interprets the quotes as quotes for the field. That means the
> second field (the string) stops after the "hhh" and the xx is considered
> invalid trailing data.
>
> What do you expect as the result of parsing that line?
>
> Stephan
>
>
> On Fri, Dec 5, 2014 at 4:16 PM, Malte Schwarzer <ms@mieo.de> wrote:
>
>> Hi,
>>
>> I’m try to import a CSV file but the parser seems to have problems this
>> quotes in the beginning of a field. Is there a way to set or disable
>> enclosures for the CSV input?
>>
>> This is my  code:
>>
>> DataSet<Tuple2<String, String>> res = env.readCsvFile(inputCsvFilename)
>>                 .fieldDelimiter('|')
>>                 .types(String.class, String.class)
>>
>> CSV:
>>
>> A|ggg
>> B|"hhh" xx
>> C|xxx
>>
>> As result I’m receiving a ParserException for line B:
>>
>> *org.apache.flink.api.common.io.ParseException: Line could not be parsed:
>> 'B|"hhh" xx**‘*
>>
>>
>> Thanks,
>> Malte
>>
>
>

Mime
View raw message