flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Resnick <cre...@gmail.com>
Subject CSV writer/parser inconsistency when using the Table API?
Date Fri, 22 Dec 2017 20:34:01 GMT
I've been trying out the Table API for some ETL using a two-stage job of
CsvTableSink (DataSet) -> CsvInputFormat (Stream). I ran into an issue
where the first stage produces output with trailing null values (valid),
which causes a parse error in the second stage.

Looking at RowCsvInputFormatTest.java, I noticed that it expects input
lines with a trailing delimiter, eg. "a|b|c|". Meanwhile, the CsvTableSink
creates rows in the form of "a|b|c". As long as 'c' is present, this input
does get successfully parsed by the RowCsvInputFormat. However, if  'c' is
defined as a number and missing, eg. the row is "a|b|", the Number parser
will fail on the empty string.

Is there something I am missing, or is there, in fact, an inconsistency
between the TableSink and the InputFormat?

View raw message