hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ChiaHung Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-387) Add task ID and superstep count informations to lock file
Date Mon, 23 May 2011 08:17:47 GMT

    [ https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037788#comment-13037788
] 

ChiaHung Lin commented on HAMA-387:
-----------------------------------

Does the cnode14 eventually enters the 98th superstep? From the log, it seems like cnode14
is going to enter the 98th superstep (but not yet log information). My understanding is that
barrier synchronization would wait all processes reach the barrier then proceed. Therefore,
if cnode14 log `enter the 98 barrier' later on, all nodes then leave barrier; such result
looks ok. 

Also, a quick look at the patch shows that the creation of znode is EPHEMERAL instead of EPHEMERAL_SEQUENTIAL;
this eliminates the issues that clients process disconnects and then reconnect scenario that
leads to the name appended with a monotonically increasing number.   


> Add task ID and superstep count informations to lock file
> ---------------------------------------------------------
>
>                 Key: HAMA-387
>                 URL: https://issues.apache.org/jira/browse/HAMA-387
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.2.0
>            Reporter: Edward J. Yoon
>             Fix For: 0.3.0
>
>         Attachments: sleepless.patch
>
>
> I think, the lock file must include:
>  * the job ID
>  * the task ID of the lock file owner
>  * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one groomserver
in the future. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message