flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1208) Skip comment lines in CSV input format. Allow user to specify comment character.
Date Wed, 19 Nov 2014 20:52:33 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218482#comment-14218482

ASF GitHub Bot commented on FLINK-1208:

Github user fhueske commented on a diff in the pull request:

    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java
    @@ -269,6 +283,21 @@ public void open(FileInputSplit split) throws IOException {
     	protected boolean parseRecord(Object[] holders, byte[] bytes, int offset, int numBytes)
throws ParseException {
    +		if (commentPrefix != null) {
    +			//check record for comments
    +			String s = new String(bytes, offset, numBytes).trim();
    --- End diff --
    We try to avoid object instantiations as much as possible and parse the data as a `byte[]`.
This effort is voided if we make the line a String for the purpose of comment checking.

> Skip comment lines in CSV input format. Allow user to specify comment character.
> --------------------------------------------------------------------------------
>                 Key: FLINK-1208
>                 URL: https://issues.apache.org/jira/browse/FLINK-1208
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API, Scala API
>    Affects Versions: 0.8-incubating
>            Reporter: Aljoscha Krettek
>            Assignee: Felix Neutatz
>            Priority: Minor
>              Labels: starter
> The current skipFirstLine is limited. Skipping arbitrary lines that start with a certain
character would be much more flexible while still easy to implement.

This message was sent by Atlassian JIRA

View raw message