incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-498) BSPTask should periodically ping its parent.
Date Wed, 29 Feb 2012 09:59:57 GMT

    [ https://issues.apache.org/jira/browse/HAMA-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219049#comment-13219049
] 

Thomas Jungblut commented on HAMA-498:
--------------------------------------

Oh I see: TestBSPTask

Ok I have a few comments on it:
- we can move this testcase to the bsp package, we don't need the ft subpackage
- "UtilsForTesting.raiseUncaughtException()" is a good idea, but I guess "throw new RuntimeException()"
would be enough
- Instead of using a dummy LocalTestSyncClient you could use LocalBSPRunner.LocalSyncClient.class.
(even if you don't need locking)
- It would be great if you can organize the imports, I see a lot of warnings in Eclipse (use
CTRL-SHIFT-o in eclipse)
- Formatting would be nice as well (CTRL-SHIFT-f)
- Like Edward told there are some missing license headers

Otherwise I think this is a good patch :) Nice work!

For the scheduling stuff:

{noformat}
try{
      Thread.sleep(15000);
    }
    catch(Exception e){
      LOG.error("Interrupted the timer for 10 sec.", e);
    }
{noformat}

This causes race conditions, you should better use a completion service that waits until the
thread really finishes. For a single thread you can get a Future from the threadpool. 

You can express this with

{noformat}
    ScheduledFuture<?> schedule = this.testBSPTaskService.schedule(
        new TestBSPTaskThreadRunner(job), 0, TimeUnit.MILLISECONDS);
    try {
      schedule.get();
    } catch (InterruptedException e1) {
      e1.printStackTrace();
    } catch (ExecutionException e1) {
      e1.printStackTrace();
    }
{noformat}

You actually used it on the third testcase, so why do you rely on sleep in the first one?

And we have some TCP cleanup issues, once the testcase failed it causes TIME_WAIT problems

{noformat}
  thomasjungblut@ubuntu:~/workspace/hama-trunk$ sudo netstat -p | grep 40000
tcp        0      0 localhost.localdo:40000 localhost.localdo:36948 TIME_WAIT   -        
      
tcp        0      0 localhost.localdo:40000 localhost.localdo:36947 TIME_WAIT   - 
{noformat}
Seems to me like a cleanup problem..
                
> BSPTask should periodically ping its parent.
> --------------------------------------------
>
>                 Key: HAMA-498
>                 URL: https://issues.apache.org/jira/browse/HAMA-498
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp
>    Affects Versions: 0.4.0
>            Reporter: Edward J. Yoon
>            Assignee: Suraj Menon
>              Labels: newbie
>             Fix For: 0.5.0
>
>         Attachments: HAMA-498.patch
>
>
> As described in http://wiki.apache.org/hama/GroomServerFaultTolerance
> BSPTask should periodically ping its parent 'GroomServer' for their health status.
> 1. If Tasks are unable to ping their parent 'GroomServer', it should be killed themselves.
> 2. And, if GroomServer does not receive ping from the childs, GroomServer should check
whether that child is running.
> You don't need to implement recovery logic in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message