hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suraj Menon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-557) Implement Checkpointing service in Hama
Date Wed, 01 Aug 2012 13:31:04 GMT

    [ https://issues.apache.org/jira/browse/HAMA-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426602#comment-13426602
] 

Suraj Menon commented on HAMA-557:
----------------------------------

Thanks,
I have to figure out why the superstep number is shown as 0, eventhough the recovery is from
2, 3 and 4.
Also, what is your opinion on the following display? :
12/08/01 06:29:00 INFO bsp.BSPJobClient: Current supersteps number: 0 : RUNNING
12/08/01 06:29:45 INFO bsp.BSPJobClient: Current supersteps number: 1 : RUNNING
12/08/01 06:29:48 INFO bsp.BSPJobClient: Current supersteps number: 2 : RUNNING
12/08/01 06:29:54 INFO bsp.BSPJobClient: Current supersteps number: 2 : RECOVERING
12/08/01 06:30:09 INFO bsp.BSPJobClient: Current supersteps number: 3 : RUNNING
12/08/01 06:30:15 INFO bsp.BSPJobClient: Current supersteps number: 3 : RECOVERING
12/08/01 06:30:30 INFO bsp.BSPJobClient: Current supersteps number: 4 : RUNNING
12/08/01 06:30:42 INFO bsp.BSPJobClient: Current supersteps number: 4 : RECOVERING
12/08/01 06:30:57 INFO bsp.BSPJobClient: Current supersteps number: 5 : RUNNING
                
> Implement Checkpointing service in Hama
> ---------------------------------------
>
>                 Key: HAMA-557
>                 URL: https://issues.apache.org/jira/browse/HAMA-557
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core
>    Affects Versions: 0.6.0
>            Reporter: Suraj Menon
>            Assignee: Suraj Menon
>             Fix For: 0.6.0
>
>         Attachments: HAMA-505-557-610-611-v1.patch, HAMA-505-557-610-611-v2.patch, HAMA-557-ft-framework.patch
>
>
> Implement checkpointing service in Apache Hama. My patches for HAMA-533 and HAMA-534
are blocked on this.
> - Checkpointing should be done as messages are either sent or received. I prefer while
receiving messages, as we can achieve some parallelism with asynchronous messages. Please
comment if you differ.
> - BSPMaster should hold the checkpoint status for each task. Checkpoint status includes
superstep count and file information for which checkpointing is complete
> - MessageManager should notify Checkpointer of a new message at BSPPeer.
> - Implement/Reuse MessageBundle class as splitClass in BSPPeerImpl for recovery in initInput.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message