flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fhueske <...@git.apache.org>
Subject [GitHub] incubator-flink pull request: enable CSV Reader to ignore invalid ...
Date Fri, 14 Nov 2014 20:11:13 GMT
Github user fhueske commented on the pull request:

    Thanks for your PR!
    I think there are a few issues with your approach. For example, a CSV file that starts
with a String field will not be skipped if it starts with a comment character such as '#'
or '//'. Also, your changes on the DataSourceTask have implications for all InputFormats which
is definitely not desired.
    IMO, it is necessary to explicitly specify a comment string and check for it at the beginning
of each line.
    Skipping invalid lines is also a good feature in my opinion. It would be good to inform
the user about invalid lines. Maybe counting the number of invalid line for each split and
emit a log statement.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message