incubator-hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "BSPModel" by thomasjungblut
Date Sat, 10 Dec 2011 23:43:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "BSPModel" page has been changed by thomasjungblut:
http://wiki.apache.org/hama/BSPModel?action=diff&rev1=3&rev2=4

  <<TableOfContents(5)>>
  
- Hama provides a user-defined function “bsp()” that can be used to write your own BSP
program. The bsp() function handles whole parallel part of the program. (It means that the
bsp() function is not a iteration part of the program.) 
- It takes one argument, which is a communication protocol interface. (Later, it'll take one
more arguments for input data and reporter, and so on.)
+ === General Information ===
+ 
+ In Apache Hama, you can implement your own BSP method by extending from {{{org.apache.hama.bsp.BSP}}}
class.
+ Apache Hama provides in this class a user-defined function {{{bsp()}}} that can be used
to write your own BSP program.
+ 
+ The {{{bsp()}}} function handles whole parallel part of the program. (So it just gets called
once, not all over again)
+ 
+ There are also {{{setup()}}} and {{{cleanup()}}} which will be called at the beginning of
your computation, respectively at the end of the computation.
+ 
+ {{{cleanup()}}} is '''guranteed''' to run after the computation or in case of failure.
+ 
+ You can simply override the functions you need from BSP class.
  
  Basically, a BSP program consists of a sequence of supersteps. Each superstep consists of
the three phases:
  
@@ -13, +23 @@

  
  NOTE that these phases should be always sequential order. 
  
+ In Apache Hama, the communication between tasks (or peers) is done within the barrier synchronization.

  
  === Communication ===
  
- Within bsp() function, you can use the powerful communication functions for many purposes
using BSPPeerProtocol. We tried to follow the standard library of BSP world as much as possible.
The following table describes all the functions you can use:
+ Within bsp() function, you can use the powerful communication functions for many purposes
using BSPPeer. We tried to follow the standard library of BSP world as much as possible. 
+ 
+ Incoming messages are stored in a queue, thus the messages are not ordered.
+ 
+ The following table describes all the functions you can use:
  ||Function||Description||
  ||send(String peerName, BSPMessage msg)||Send a message to another peer.||
- ||put(BSPMessage msg)||Put a message to local queue.||
- ||getCurrentMessage()||Get a received message.||
+ ||getCurrentMessage()||Get a received message from the queue.||
- ||getNumCurrentMessages()||Get the number of received messages.||
+ ||getNumCurrentMessages()||Get the number of messages currently in the queue.||
- ||sync()||Barrier synchronization.||
+ ||sync()||Starts the barrier synchronization.||
- ||getPeerName()||Get a peer name.||
+ ||getPeerName()||Get the peer name of this task.||
- ||getPeerName(int index)||Get a nth peer name.||
+ ||getPeerName(int index)||Gets the n-th peer name.||
  ||getNumPeers()||Get the number of peers.||
- ||getAllPeerNames()||Get all peer names.||
+ ||getAllPeerNames()||Get all peer names (including "this" task). (Hint: These are always
sorted in ascending order)||
  
- The send(), put() and the other all functions are very flexible. For example, you can send
one more messages on to any other processes in bsp() function:
+ Here is an example that sends a message to all peers:
  
  {{{
-   public void bsp(BSPPeerProtocol bspPeer) throws IOException,
-         KeeperException, InterruptedException {
+   public final void bsp(
+       BSPPeer<KEYIN, VALUEIN, KEYOUT, VALUEOUT> peer)
+       throws IOException, InterruptedException, SyncException {
+ 
-     for (String otherPeer : bspPeer.getAllPeerNames()) {
+     for (String otherPeer : peer.getAllPeerNames()) {
-       String peerName = bspPeer.getPeerName();
+       String peerName = peer.getPeerName();
-       BSPMessage msg = 
+       LongMessage msg = 
-         new BSPMessage(Bytes.toBytes(peerName), Bytes.toBytes(“Hi”));
+         new LongMessage("Hello from " + peer.getPeerName());
        bspPeer.send(peerName, mgs);
      }
+ 
-   bspPeer.sync();
+     bspPeer.sync();
+   
  }
  }}}
  
+ The generics in the BSPPeer are related to the [IOSystem | Input and Output System].
+ 
  === Synchronization ===
  
- When all processes have entered the barrier by sync() function, the Hama proceeds to the
next superstep. In previous example case, the BSP job will be finished by one synchronization
after sending a message “Hi” to all peers.
+ When all processes have entered the barrier by sync() function, the Hama proceeds to the
next superstep. In previous example case, the BSP job will be finished by one synchronization
after sending a message “Hello from ...” to all peers.
- 
- But, keep in mind that the sync() function not means the end of BSP job. As mentioned previously,
the all communication functions are very flexible. For example, the sync() function can be
also located in a for loop:
+ The sync() function is very flexible.
+ For example, the sync() function can be also located in a for loop:
  
  {{{
- public void bsp(BSPPeerProtocol bspPeer) throws IOException,
-   KeeperException, InterruptedException {
+    public final void bsp(
+       BSPPeer<KEYIN, VALUEIN, KEYOUT, VALUEOUT> peer)
+       throws IOException, InterruptedException, SyncException {
+ 
      for (int i = 0; i < 100; i++) {
-       ….
+       // send some messages
        bspPeer.sync();
      }
+ 
    }
  }}}
  
- The BSP job will be finished only when all processes have no more local and outgoing queues
entries and all processes done. (or killed by user.)
+ The BSP job will be finished only when all processes have no more local and outgoing queues
entries and all processes done or is killed by the user.
  
+ 
+ '''Note that the barrier synchronization is very costly because it is a global synchronization.
So you should synchronize as few as possible.'''
+ 

Mime
View raw message