flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1208) Skip comment lines in CSV input format. Allow user to specify comment character.
Date Sat, 22 Nov 2014 23:29:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222248#comment-14222248
] 

ASF GitHub Bot commented on FLINK-1208:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/201#discussion_r20760689
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
    @@ -130,6 +216,21 @@ public OUT readRecord(OUT reuse, byte[] bytes, int offset, int numBytes)
{
     			numBytes--;
     		}
     		
    +		if (commentPrefix != null && commentPrefix.length <= numBytes) {
    +			//check record for comments
    +			Boolean isComment = true;
    +			for (int i = 0; i < commentPrefix.length; i++) {
    +				if (commentPrefix[i] != bytes[offset + i]) {
    +					isComment = false;
    +					break;
    +				}
    +			}
    +			if (isComment) {
    +				this.commentCount++;
    +				return nextRecord(reuse);
    --- End diff --
    
    My intention was to capture the null in the DelimitedInputFormat before it is given to
the DataSourceTask.


> Skip comment lines in CSV input format. Allow user to specify comment character.
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-1208
>                 URL: https://issues.apache.org/jira/browse/FLINK-1208
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API, Scala API
>    Affects Versions: 0.8-incubating
>            Reporter: Aljoscha Krettek
>            Assignee: Felix Neutatz
>            Priority: Minor
>              Labels: starter
>
> The current skipFirstLine is limited. Skipping arbitrary lines that start with a certain
character would be much more flexible while still easy to implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message