hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5069) add concrete common implementations of CombineFileInputFormat
Date Tue, 23 Apr 2013 14:35:17 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639107#comment-13639107

Robert Joseph Evans commented on MAPREDUCE-5069:

I have a few minor comments.  Nothing major and nothing that I am very tied to so if you want
to push back on them I would be fine checking the code in as is.
I understand why you wrote CombineFileRecordReaderWrapper instead of extending CombineFileRecordReader,
but I am not really sure the name is that clear.  CombineFileRecordReaderWrapper is not a
wrapper for a CombineFileRecordReader, but is a wrapper for a FileRecordReader that can be
used by CombineFileRecordReader.  I am not really sure of a clearer name for it though so
I think it is OK.

I think I would prefer to have checkFileSplit be turned into asserts, or possibly remove it
entirely.  If we have gotten that far we should be controlled by the CombineFileRecordReader,
and we are just validating that it has set the values that it should have set already and
does currently set.  I just don't see many situations out side of testing where we would want
to check this.  Which is why I would change them to asserts so that for tests we can validate
that everything is correct, but there is no overhead when we are not testing.  

I am also not too happy with having Random used in tests.  Random makes it very hard to reproduce
an error, I have had too many of them fail sporadically, and have to dig through the logs
to find the seed, then hard code the seed... Granted that means that Random found a bug that
hard codeing it would not, and I realize that, so I'll leave it up to you. I would just rather
have the test with a hard coded seed, or not be random at all.  But that is just me.  

> add concrete common implementations of CombineFileInputFormat
> -------------------------------------------------------------
>                 Key: MAPREDUCE-5069
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5069
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1, mrv2
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sangjin Lee
>            Priority: Minor
>         Attachments: MAPREDUCE-5069-1.patch, MAPREDUCE-5069-2.patch, MAPREDUCE-5069-3.patch,
MAPREDUCE-5069-4.patch, MAPREDUCE-5069-5.patch, MAPREDUCE-5069.patch
> CombineFileInputFormat is abstract, and its specific equivalents to TextInputFormat,
SequenceFileInputFormat, etc. are currently not in the hadoop code base.
> These sound like very common need wherever CombineFileInputFormat is used, and different
folks would write the same code over and over to achieve the same goal. It sounds very natural
for hadoop to provide at least the text and sequence file implementations of the CombineFileInputFormat

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message