incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-387) Add task ID and superstep count informations to lock file
Date Fri, 17 Jun 2011 15:22:47 GMT

    [ https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051131#comment-13051131
] 

Thomas Jungblut commented on HAMA-387:
--------------------------------------

Hmm crap.

Can we add a testcase, this should be easily reproducable?

And what if we prevent peers from entering the barrier if the zookeeper lock still exists?
For example like this:

{noformat}
 protected boolean enterBarrier() throws KeeperException, InterruptedException {
    LOG.debug("[" + getPeerName() + "] enter the enterbarrier");
    try {
      while (zk.exists(bspRoot + "/" + getPeerName(), false) != null) {
        Thread.sleep(500L);
      }
      zk.create(bspRoot + "/" + getPeerName(),
          Bytes.toBytes(this.getSuperstepCount()), Ids.OPEN_ACL_UNSAFE,
          CreateMode.EPHEMERAL);
    } catch (KeeperException e) {
      LOG.error("Exception while entering barrier!", e);
    } catch (InterruptedException e) {
      LOG.error("Exception while entering barrier!", e);
    }
// etc omitted ...
{noformat}

> Add task ID and superstep count informations to lock file
> ---------------------------------------------------------
>
>                 Key: HAMA-387
>                 URL: https://issues.apache.org/jira/browse/HAMA-387
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.2.0
>            Reporter: Edward J. Yoon
>             Fix For: 0.3.0
>
>         Attachments: HAMA-387_v02.patch, sleepless.patch
>
>
> I think, the lock file must include:
>  * the job ID
>  * the task ID of the lock file owner
>  * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one groomserver
in the future. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message