flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5907) RowCsvInputFormat bug on parsing tsv
Date Mon, 27 Feb 2017 15:08:47 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885909#comment-15885909
] 

ASF GitHub Bot commented on FLINK-5907:
---------------------------------------

Github user KurtYoung commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3417#discussion_r103226821
  
    --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java
---
    @@ -358,24 +358,27 @@ protected boolean parseRecord(Object[] holders, byte[] bytes, int
offset, int nu
     		for (int field = 0, output = 0; field < fieldIncluded.length; field++) {
     			
     			// check valid start position
    -			if (startPos >= limit) {
    +			if (startPos > limit || (startPos == limit && field != fieldIncluded.length
- 1)) {
     				if (lenient) {
     					return false;
     				} else {
     					throw new ParseException("Row too short: " + new String(bytes, offset, numBytes));
     				}
     			}
    -			
    +
     			if (fieldIncluded[field]) {
     				// parse field
     				@SuppressWarnings("unchecked")
     				FieldParser<Object> parser = (FieldParser<Object>) this.fieldParsers[output];
     				Object reuse = holders[output];
     				startPos = parser.resetErrorStateAndParse(bytes, startPos, limit, this.fieldDelim,
reuse);
     				holders[output] = parser.getLastResult();
    -				
    +
     				// check parse result
    -				if (startPos < 0) {
    +				if (startPos < 0 ||
    +						(startPos == limit
    --- End diff --
    
    done


> RowCsvInputFormat bug on parsing tsv
> ------------------------------------
>
>                 Key: FLINK-5907
>                 URL: https://issues.apache.org/jira/browse/FLINK-5907
>             Project: Flink
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.2.0
>            Reporter: Flavio Pompermaier
>            Assignee: Kurt Young
>              Labels: csv, parsing
>         Attachments: test.tsv
>
>
> The following snippet reproduce the problem (using the attached file as input):
> {code:language=java}
> char fieldDelim = '\t';
>     TypeInformation<?>[] fieldTypes = new TypeInformation<?>[51];
>     for (int i = 0; i < fieldTypes.length; i++) {
>       fieldTypes[i] = BasicTypeInfo.STRING_TYPE_INFO;
>     }
>     int[] fieldMask = new int[fieldTypes.length];
>     for (int i = 0; i < fieldMask.length; i++) {
>       fieldMask[i] = i;
>     }
>     RowCsvInputFormat csvIF = new RowCsvInputFormat(new Path(testCsv), fieldTypes, "\n",
fieldDelim +"", 
>        fieldMask, true);
>     csvIF.setNestedFileEnumeration(true);
>     DataSet<Row> csv = env.createInput(csvIF);
>    csv.print()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message