hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Ash (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13064) LineReader reports incorrect number of bytes read resulting in correctness issues using LineRecordReader
Date Fri, 29 Apr 2016 21:27:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264777#comment-15264777
] 

Andrew Ash commented on HADOOP-13064:
-------------------------------------

[~jellis] those two do look pretty related -- were you testing with version 2.7.1 by chance?
 Can you check if your test passes in 2.7.2 which contains fixes for both those tickets?

> LineReader reports incorrect number of bytes read resulting in correctness issues using
LineRecordReader
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13064
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13064
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Joe Ellis
>            Priority: Critical
>         Attachments: LineReaderTest.java
>
>
> The specific issue we were seeing with LineReader is that when we pass in '\r\n' as the
line delimiter the number of bytes that it claims to have read is less than what it actually
read. We narrowed this down to only happening when the delimiter is split across the internal
buffer boundary, so if fillbuffer fills with "row\r" and the next call fills with "\n" then
the number of bytes reported would be 4 rather than 5.
> This results in correctness issues in LineRecordReader because if this off by one issue
is seen enough times when reading a split then it will continue to read records past its split
boundary, resulting in records appearing to come from multiple splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message