flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Introduction
Date Thu, 06 Nov 2014 14:32:25 GMT
Hi Johannes!

Welcome :-)

Right now, the parsers are used only in the CSV formats, so you can adjust
them to that format's needs.


On Thu, Nov 6, 2014 at 3:17 PM, Kirschnick, Johannes <
johannes.kirschnick@tu-berlin.de> wrote:

> Hello,
> as have some other fellow colleges as well I would like to introduce
> myself as well to the list.
> I am a PhD student from Berlin who wants to work with Flink .
> As suggested by the getting started guide I had a look at some starter
> issues and found the issue about comments in CSV lines
> https://issues.apache.org/jira/browse/FLINK-1208
> While looking into it I noticed that the current CSV parser does not
> correctly read escaped fields
> There is of course a debate as to how to escape any value in CSV files,
> but the common use is to use " as the escape character
> So the following line will not parse
> 1997,Ford,E350,"Super, ""luxurious"" truck"
> I had a look into why that is and if I could propose a fix for it.
> Being a novice to the codebase I noticed that the CSV Parser uses the
> parsers from
> org.apache.flink.types.parser.*
> So the question I have:
> Are these parsers only used for CSV files and thus would introducing the
> escaping mechanism just work - or are they used in a lot of other places
> requiring a special handling in case of CSV instead.
> Thus fixing the escaping would actually mean to break/ fix a lot of other
> thing?
> Johannes

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message