hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-647) Make the input spliter robustly
Date Mon, 24 Sep 2012 15:16:08 GMT

    [ https://issues.apache.org/jira/browse/HAMA-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461851#comment-13461851
] 

Edward J. Yoon commented on HAMA-647:
-------------------------------------

{code}
   protected long computeSplitSize(long goalSize, long minSize, long blockSize) {
-    return Math.max(minSize, Math.min(goalSize, blockSize));

+    if (goalSize > blockSize) {
+      return Math.max(minSize, Math.max(goalSize, blockSize));
+    } else {
+      return Math.max(minSize, Math.min(goalSize, blockSize));
+    }
{code}

This is good catch.

By the way,

{code}
@@ -214,9 +215,13 @@
         }
       }
       return splits.toArray(new FileSplit[splits.size()]);
+    } else if (files.length == 1) {
+      goalSize = totalSize / (numSplits == 0 ? 1 : numSplits - 1);
{code}

If files.length == 1 and numSplits == 1, java will throw ArithmeticException. 
∵ numSplits - 1 equals zero, correct?
                
> Make the  input spliter robustly
> --------------------------------
>
>                 Key: HAMA-647
>                 URL: https://issues.apache.org/jira/browse/HAMA-647
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp core
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Yuesheng Hu
>            Assignee: Yuesheng Hu
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: HAMA-647-2.patch, HAMA-647.patch
>
>
> Currently, the spliter in FileInputFormat is based on the Mapreduce's spliter. But, Hama
is different from Mapreduce, Hama's task can not be  pended until the slot becomes free. 
So, the current spliter is not suitable for Hama. When input file is small, it may be ok,
but when input is  very large, the number of splits will be very large too, even our cluster
is powerful enough to handle the input. More details, please see the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message