hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5670) CombineFileRecordReader should report progress when moving to the next file
Date Tue, 21 Jan 2014 20:56:22 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877847#comment-13877847

Jason Lowe commented on MAPREDUCE-5670:

Thanks for picking this up, Chen!  Comments on the patch:

* org.apache.hadoop.mapred.lib.db.TestCombineFileRecordReader is in the wrong package, should
be org.apache.hadoop.mapred.lib.TestCombineFileRecordReader
* There's an unused Mockito import in one of the tests, and it seems like Mockito could save
quite a bit of test code.  A mocked reporter with verify(mockReporter).progress() to verify
the progress method was called would accomplish the same with much less code.  Did that not
work for some reason?  The unused import implies that approach was probably attempted, so
curious if that's the case why it didn't work out.
* The original bug involves a bunch of empty files, so the test doesn't need to write any
data to the test files and all the lengths should be zero.  Bonus points if the test calls
next() on the record reader, verifies it returns false and verifies progress was called the
correct number of times.
* There's an unnecessary try...finally block in testInitNextRecordReader

> CombineFileRecordReader should report progress when moving to the next file
> ---------------------------------------------------------------------------
>                 Key: MAPREDUCE-5670
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5670
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.9
>            Reporter: Jason Lowe
>            Assignee: Chen He
>            Priority: Minor
>         Attachments: MR-5670.patch
> If a combine split consists of many "empty" files (i.e.: no record found by the underlying
record reader) then theoretically a task can timeout due to lack of reported progress.

This message was sent by Atlassian JIRA

View raw message