hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-970) Exception can occur if the size of splits is bigger than numBSPTasks
Date Wed, 09 Dec 2015 04:16:11 GMT

    [ https://issues.apache.org/jira/browse/HAMA-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047998#comment-15047998

Edward J. Yoon commented on HAMA-970:


To launch more tasks than num of splits,  you should use input partitioner  - https://github.com/apache/hama/blob/master/core/src/test/java/org/apache/hama/bsp/TestPartitioning.java
example, you should use input partitioner. 

For example, if you have a 10MB file and set the number of tasks 10 with partitioner, the
framework automatically partition 10MB file into 10 files and then launch your main BSP program
with 10 tasks.

Previously in my Input Paths, I was adding 2 files, one empty file and one 70 MB file. This
is working but Hama only opens up 2 tasks, one for empty file (which becomes the master) and
one for 70 MB file (which becomes my only slave). Now, since I want to divide the 70 MB file
into 4-5 tasks if I try to do this solution, I get an exception.

You can do like this: 1) partition one 70MB file into 9 files (manually) and then launch the
BSP program with setNumOfTasks(10);

> Exception can occur if the size of splits is bigger than numBSPTasks
> --------------------------------------------------------------------
>                 Key: HAMA-970
>                 URL: https://issues.apache.org/jira/browse/HAMA-970
>             Project: Hama
>          Issue Type: Bug
>          Components: bsp core
>    Affects Versions: 0.7.0
>            Reporter: JongYoon Lim
>            Priority: Trivial
>         Attachments: HAMA-970.patch
> In JonInProgress, it's possble to get Exception in initTasks(). 
> {code:java}
> this.tasks = new TaskInProgress[numBSPTasks];
> for (int i = 0; i < splits.length; i++) {
>   tasks[i] = new TaskInProgress(getJobID(), this.jobFile.toString(), splits[i], this.conf,
this, i);
> }
> {code}
> I'm not sure that *numBSPTask* is always bigger than *splits.length*. 
> So, I think it's better to use bigger value to assign the *tasks* array. 

This message was sent by Atlassian JIRA

View raw message