hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "HamaStreaming" by thomasjungblut
Date Mon, 17 Sep 2012 17:10:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "HamaStreaming" page has been changed by thomasjungblut:
http://wiki.apache.org/hama/HamaStreaming?action=diff&rev1=3&rev2=4

  
  In any case you should now find a "HamaStreaming" folder in your Hama home directory which
contains several scripts.
  
+ Now we have to upload these scripts to HDFS:
+ 
+ {{{
+ hadoop/bin/hadoop fs -mkdir /tmp/PyStreaming/
+ hadoop/bin/hadoop fs -copyFromLocal HamaStreaming/* /tmp/PyStreaming/
+ }}}
+ 
  Let's start by executing the usual Hello World application that already ships with streaming:
  
  {{{
- bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2 -cachefiles HamaStreaming/*.py
-output /tmp/pystream-out/ -program HamaStreaming/BSPRunner.py -programArgs HamaStreaming/HelloWorldBSP.py
+ bin/hama pipes -streaming true -bspTasks 2 -interpreter python3.2 -cachefiles /tmp/PyStreaming/*.py
-output /tmp/pystream-out/ -program /tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP
  }}}
  
+ This will start 2 bsp tasks in streaming mode. In streaming a child process will be forked
from the usual BSP Java task. In this case, this would yield to a new task that starts with
python3.2, with the py files from HDFS. The noteworthy thing is actually, that you pass a
runner class that takes care of all the protocol communication. Your user program is passed
as the first program argument.
+ This works because python will start the runner py in a work directory from the cache files.
So they are implicitly included and the whole computation can work, this is why you don't
have to provide a path with the HelloWorldBSP (note the py is not needed, because of the reflective
import).
+ 
+ Hopefully you should see something along these lines:
+ 
+ {{{
+ 12/09/17 19:06:31 INFO pipes.Submitter: Streaming enabled!
+ 12/09/17 19:06:33 INFO bsp.BSPJobClient: Running job: job_201209171906_0001
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient: Job complete: job_201209171906_0001
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient: The total number of supersteps: 15
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient: Counters: 8
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:   org.apache.hama.bsp.JobInProgress$JobCounter
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=2
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:   org.apache.hama.bsp.BSPPeerImpl$PeerCounter
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=15
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     COMPRESSED_BYTES_SENT=3310
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=2805
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     COMPRESSED_BYTES_RECEIVED=3310
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     TOTAL_MESSAGES_SENT=60
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     TOTAL_MESSAGES_RECEIVED=30
+ 12/09/17 19:06:40 INFO bsp.BSPJobClient:     TASK_OUTPUT_RECORDS=28
+ 
+ }}}
+ 
+ And now you can view the output of your job with:
+ 
+ {{{
+ hadoop/bin/hadoop fs -cat /tmp/pystream-out/part-00001
+ }}}
+ 
+ in my case this looks like this:
+ 
+ {{{
+ Hello from localhost:61002 in superstep 0	
+ Hello from localhost:61001 in superstep 0	
+ Hello from localhost:61001 in superstep 1
+ Hello from localhost:61002 in superstep 1
+ [...]
+ Hello from localhost:61001 in superstep 14	
+ Hello from localhost:61002 in superstep 14	
+ }}}
+ 

Mime
View raw message