hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8503) logic difference between old mapred.FileInputFormat and mapreduce.lib.input.FileInputFormat
Date Mon, 11 Jun 2012 05:08:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292641#comment-13292641
] 

Harsh J commented on HADOOP-8503:
---------------------------------

bq. i.e. we don't have goal size anymore

I believe this was intentional and to remove confusion around "specify number of maps to run"
kind of needs, that doesn't sit well with files as input.

For the issue with the min/max size, do you mean to report that setting min-split-size has
no impact in increasing number of input splits (i.e. mappers)? If possible, can you also attach
in code form the bug you wish to report (like, a test case of whats to be expected vs. reality)?
                
> logic difference between old mapred.FileInputFormat and mapreduce.lib.input.FileInputFormat
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8503
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8503
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Yang Yang
>            Priority: Minor
>
> in the old mapred.FileInputFormat.getSplits(JobConf, int) 
>         long splitSize = computeSplitSize(goalSize, minSize, blockSize);
> so we could control splitSize with the goalSize, which is controlled by mapred.map.tasks

> in the new code, mapreduces.lib.input.FileInputFormat
>         long splitSize = computeSplitSize(blockSize, minSize, maxSize);
> i.e. we don't have goal size anymore, furthermore,
> the implementation of computeSplitSize() no longer makes sense:
>     return Math.max(minSize, Math.min(maxSize, blockSize));
> since we assume that maxSize is always bigger than minSize, the above line is equivalent
to  just
> return Math.min(maxSize, blockSize), so minSize is useless 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message