flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamara Mendt <tammyme...@gmail.com>
Subject Re: Read CSV Parse Quoted Strings Function
Date Mon, 24 Aug 2015 09:55:48 GMT
Thank you Maximilian,

I agree and would be happy to fix this issue.



On Mon, Aug 24, 2015 at 11:50 AM, Maximilian Michels <mxm@apache.org> wrote:

> Hi Tamara,
> Quoted strings should not contain the quoting character. The way to work
> around this is to escape the quote characters. However, currently there is
> no option to escape quotes which pretty much forbids any use of quote
> characters within quoted fields. This should be fixed. I opened a JIRA for
> this issue: https://issues.apache.org/jira/browse/FLINK-2567
> As for your idea for parsing quoted fields, I personally prefer escaping
> the quoting characters. In quoted fields, Flink allows all characters
> except quotes which means, we have to read the entire file to know whether
> we can close a quote. Additionally, we need to keep track of how many
> quotes are opened and closed.
> While your proposal is a very convenient feature, I think we should rather
> implement explicit quoting for performance and clarity reasons.
> Cheers,
> Max
> On Mon, Aug 24, 2015 at 10:40 AM, Tamara Mendt <tammymendt@gmail.com>
> wrote:
>> Hi all,
>> When using the parseQuotedStrings function for the CsvReader class, I
>> have noticed that if the caracter of the quotes is also inside of the
>> string, the parsing fails.
>> For example, if there is a field of this form:
>> "RT @sportsguy33: New Time Warner slogan: "Time Warner, where we make you
>> long for the days before cable.""
>> I think it is not so uncommon to have a case like this and it should not
>> fail, but rather the string should be parsed as:
>> RT @sportsguy33: New Time Warner slogan: "Time Warner, where we make you
>> long for the days before cable."
>> I have found the part of the Flink code that raised this exception and
>> can fix it, but wanted to consult first if others agree that this is an
>> issue.
>> Cheers,
>> Tamara

Tamara Mendt

View raw message