hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suraj Menon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-557) Implement Checkpointing service in Hama
Date Fri, 20 Jul 2012 18:37:34 GMT

    [ https://issues.apache.org/jira/browse/HAMA-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419420#comment-13419420

Suraj Menon commented on HAMA-557:

Thanks for the detailed review Thomas. 

I actually continued working from the point of this patch. 

> Compile Problem TestCheckpoint overrides the method replayMessages, but it does not exist,
should it exist?
Sorry, I have fixed the compile problem. Caught me doing mvn install -DskipTests = true.

> In hama-core there is a folder created called "nullzookeeper", from what testcase does
that come from?
I think it should be from the TestSyncService, I shall look into it.

> Can we remove the tilde's ~ from debug output?
The logs with ~s are going to be removed completely. I have it for quick check on progress.

> Do you think we should stick with defining interfaces with "I" in front? I'm naming interfaces
without them and call the concrete implementations *Impl. What do you think is the best?
I can change it and we can continue the naming convention that is already there.

> Now we have a lot of services, we could extract init and close to a superinterface, WDYT?
Don't know about the usage then if they can be composed.
Most of our services get initialized by ReflectionUtils.newInstance. Until now we don't have
a common service that validates the interaction between each of these services. Let's look
into this once we have such a requirement. I also feel inits may have different signatures
for different services.

Thanks, your additional notes had some good catches. I have fixed few of those already. In
my final version, I am planning to put more unit test cases along with some fixes for issues
I am encountering on my cluster. Hopefully, you were happy with the way refactored code interacted
with each other. I wanted these to be building blocks for future work.

> Implement Checkpointing service in Hama
> ---------------------------------------
>                 Key: HAMA-557
>                 URL: https://issues.apache.org/jira/browse/HAMA-557
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core
>    Affects Versions: 0.6.0
>            Reporter: Suraj Menon
>            Assignee: Suraj Menon
>             Fix For: 0.6.0
>         Attachments: HAMA-505-557-610-611-v1.patch, HAMA-557-ft-framework.patch
> Implement checkpointing service in Apache Hama. My patches for HAMA-533 and HAMA-534
are blocked on this.
> - Checkpointing should be done as messages are either sent or received. I prefer while
receiving messages, as we can achieve some parallelism with asynchronous messages. Please
comment if you differ.
> - BSPMaster should hold the checkpoint status for each task. Checkpoint status includes
superstep count and file information for which checkpointing is complete
> - MessageManager should notify Checkpointer of a new message at BSPPeer.
> - Implement/Reuse MessageBundle class as splitClass in BSPPeerImpl for recovery in initInput.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message