flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ebru <b20926...@cs.hacettepe.edu.tr>
Subject Re: Dataset read csv file problem
Date Mon, 27 Nov 2017 07:59:17 GMT
Thank you Fabian, we’ve implemented a custom CsvInputFormat.


> On 24 Nov 2017, at 15:35, Fabian Hueske <fhueske@gmail.com> wrote:
> 
> Hi Ebru,
> 
> this case is not supported by Flink's CsvInputFormat. The problem is that such a file
could not be read in parallel because it is not possible to identify record boundaries if
you start reading in the middle of the file.
> We have a new CsvInputFormat under development that follows the RFC 4180 standard which
will have an parameter to support row delimiters that are encapsulated in a String field.
> 
> Until that is available, the only solution is to implement a custom InputFormat.
> 
> Best, Fabian
> 
> 2017-11-24 11:40 GMT+01:00 ebru <b20926247@cs.hacettepe.edu.tr <mailto:b20926247@cs.hacettepe.edu.tr>>:
> Hello all,
> 
> We are trying to read csv files which contains fields containing  \n character, also
\n character is line delimiter. We used parseQuotedStrings('\"')
>  Method but, it ignores only field delimiters so we couldn’t parse the fields that
contains \n character. How can we solve this problem?
> 
> -Ebru
> 


Mime
View raw message