incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-503) Chainable computations for tault tolerance
Date Tue, 07 Feb 2012 16:26:59 GMT

    [ https://issues.apache.org/jira/browse/HAMA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202510#comment-13202510
] 

Thomas Jungblut commented on HAMA-503:
--------------------------------------

Hey Lin, 

I have made a bit of an "interface".

For a superstep:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/ft/Superstep.java

For the BSP that can handle faults:

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/ft/FaultTolerantBSP.java

The idea behind it is, that you init a task with a kind of start superstep. This is the index
of the array of user defined supersteps. 
When fault happens, we inject the index where the superstep failed to the new task, so at
runtime it will start computation from the given point.

I have not really tried to make a real-world BSP example with it, so the Superstep class may
not be a good interface.

What do you think?
                
> Chainable computations for tault tolerance
> ------------------------------------------
>
>                 Key: HAMA-503
>                 URL: https://issues.apache.org/jira/browse/HAMA-503
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp
>    Affects Versions: 0.4.0
>            Reporter: Thomas Jungblut
>             Fix For: 0.5.0
>
>
> refactor bsp() in allowing checkpointed messages to be recovered. 
> ChiaHung Lin had a fancy idea in chaining superstep class to make the whole recovering
more convenient and less error prone, or at least possible.
> A user does not define a BSP anymore, instead he defines a single superstep inside of
a computation class. A user is able to chain these in a specific ordering. After each of this
computation the framework calls sync() and exchanges the messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message