hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9622) bzip2 codec can drop records when reading data in splits
Date Thu, 21 Nov 2013 03:14:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828448#comment-13828448
] 

Vinay commented on HADOOP-9622:
-------------------------------

Thanks Jason for the patch for this tricky issue.
Patch looks good to me.

One small nit.
There are already two Test classes TestLineRecordReader in mapred and mapreduce.lib.input
packages in hadoop-mapreduce-client-jobclient project. It will be better to move included
tests to these classes instead of creating multiple classes.

> bzip2 codec can drop records when reading data in splits
> --------------------------------------------------------
>
>                 Key: HADOOP-9622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9622
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 2.0.4-alpha, 0.23.8
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, HADOOP-9622.patch,
blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2
>
>
> Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when reading them
in splits based on where record delimiters occur relative to compression block boundaries.
> Thanks to [~knoguchi] for discovering this problem while working on PIG-3251.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message