incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ChiaHung Lin (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (HAMA-413) Remove limitation on the number of tasks
Date Thu, 25 Aug 2011 07:02:29 GMT

    [ https://issues.apache.org/jira/browse/HAMA-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090828#comment-13090828
] 

ChiaHung Lin edited comment on HAMA-413 at 8/25/11 7:01 AM:
------------------------------------------------------------

Below is what I observe. 

GroomServer periodically checks if TaskRunner is not running (!tip.runner.isAlive()), then
it sets the phase to cleanup and reports back to BSPMaster. However, within TaskRunner's run(),
its execution may immediately finish if it simply launches another thread along with spawning
another child process (i.e. BSPPeer); for example, in the patch HAMA-398 TaskRunner.run()

{code}
public void run() {
...
BspChildRunner bspPeer = new BspChildRunner(bspArgs, workDir); // spawn bsp peer child process
bspPeer.start();
... // after start(), it immediate returns so within offerService() taskStatus will be set
to cleanup because runner.isAlive() is false
    // but the writing data to hdfs perhaps is not yet finished.
}
{code}

In the HAMA-398 v1 patch, assert with join, which in turns makes use of Future.get() would
ideally have the same effect as original procedure with waitFor().

{code}
public void run() {
...
BspChildRunner bspPeer = new BspChildRunner(bspArgs, workDir); // spawn bsp peer child process
bspPeer.start();
bspPeer.join(); // wait for bsppeer finishes its execution, including writing data to hdfs.
 
...
}
{code}

      was (Author: chl501):
    Below is what I observe. 

GroomServer periodically checks if TaskRunner is not running (!tip.runner.isAlive()), then
it sets the phase to cleanup and reports back to BSPMaster. However, within TaskRunner's run(),
its execution may immediately finish if it simply launches another thread along with spawning
another child process (i.e. BSPPeer); for example, in the patch HAMA-398 TaskRunner.run()

public void run() {
...
BspChildRunner bspPeer = new BspChildRunner(bspArgs, workDir); // spawn bsp peer child process
bspPeer.start();
... // after start(), it immediate returns so within offerService() taskStatus will be set
to cleanup because runner.isAlive() is false
    // but the writing data to hdfs perhaps is not yet finished.
}

In the HAMA-398 v1 patch, assert with join, which in turns makes use of Future.get() would
ideally have the same effect as original procedure with waitFor().

public void run() {
...
BspChildRunner bspPeer = new BspChildRunner(bspArgs, workDir); // spawn bsp peer child process
bspPeer.start();
bspPeer.join(); // wait for bsppeer finishes its execution, including writing data to hdfs.
 
...
}

  
> Remove limitation on the number of tasks
> ----------------------------------------
>
>                 Key: HAMA-413
>                 URL: https://issues.apache.org/jira/browse/HAMA-413
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.4.0
>
>         Attachments: HAMA-413_v01.patch, HAMA-413_v02.patch, HAMA-413_v03.patch, HAMA-413_v05.patch,
HAMA_413_v04.patch
>
>
> By HAMA-410 patch, BSPPeer object will be constructed at child process. Now we can just
remove limitation on the number of tasks.
> Here's TODO list:
> 1. The number of tasks per groom should be configurable e.g., 'bsp.local.tasks.maximum'.
> 2. The 'totalTaskCapacity' should be calculated at BSPMaster.getClusterStatus().
> 3. When scheduling tasks, consider how to allocate them.
> 4. Each BSPPeer should know all created peers of Hama cluster by job. It can be listed
based on actions of GroomServer.
> 5. In examples, 'cluster.getGroomServers()' can be changed to 'cluster.getMaxTasks()'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message