hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9622) bzip2 codec can drop records when reading data in splits
Date Thu, 21 Nov 2013 14:59:36 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828993#comment-13828993
] 

Jason Lowe commented on HADOOP-9622:
------------------------------------

bq. There are already two Test classes TestLineRecordReader in mapred and mapreduce.lib.input
packages in hadoop-mapreduce-client-jobclient project. It will be better to move included
tests to these classes instead of creating multiple classes.

I'd much rather keep the unit tests for LineRecordReader in the same package as the code,
that way when the code is updated Jenkins will run the tests to catch errors.  If we move
these unit tests to the jobclient module then if a patch touches only LineRecordReader in
the core module we won't run the unit tests since they're in a different module.

Instead I'd rather rename the TestLineRecordReader tests in the jobclient module to something
like TestLineRecordReaderJobs.  Those tests are really integration tests rather than unit
tests, since they're running a job for each test rather than just the LineRecordReader in
isolation.

> bzip2 codec can drop records when reading data in splits
> --------------------------------------------------------
>
>                 Key: HADOOP-9622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9622
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 2.0.4-alpha, 0.23.8
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, HADOOP-9622.patch,
blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2
>
>
> Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when reading them
in splits based on where record delimiters occur relative to compression block boundaries.
> Thanks to [~knoguchi] for discovering this problem while working on PIG-3251.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message