hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Behroz Sikander <bsikan...@apache.org>
Subject Question regarding Hama synchronization behavior and GSOC
Date Mon, 18 Jan 2016 15:28:24 GMT
I have 2 questions regarding Hama.

Q1: Is Hama going to participate in GSOC 2016 ?

Q2: In the image below, I see an interesting behavior of Hama but I am not
sure why the behavior is like this.


On x-axis, I have the total number of data that I need to process. On
y-axis, I have the time in minutes which is aggregated over 200 iterations.
Each line in plot represent different number of Hama tasks (Peers) used to
process the data. Overall this plot is showing the *total time that master
task waits for slave tasks to synchronize (*for* 200 iterations *in*

1) total time master waits for slaves in *1* *iteration* = (time of slave
processing) +
*(time of synchronization)*
The plot is only showing the *time in synchronization* aggregated over *200
iterations*. I am using this plot to study the time taken by Hama in

2) The total data is divided among all the tasks equally. For example, if I
am using 10 tasks to process 10K data, then each task will get 1000. If i
use 20 tasks to process 10K, then each will have 500.

Now in the plot for example, blue line represents 10 tasks. If I process
10,000 files in 200 iterations the master waits for almost 3 minutes for
slaves to synchronize.

Now if you look closely, then if I *increase* the *number of tasks* to
process the data, the *time* of master waiting for *slaves to
synchronization* starts to *decrease*. For example, look at the points on
50K data, for 30 tasks master waits for ~10 minutes, for 40 tasks it waits
for only ~6 minutes and for 50 tasks, it took ~4mins.

Q: My question is that how to interpret this information ?
The answer that I came up is that the *outgoing message queue* of tasks is
smaller in case I use more tasks to process and bigger in case I have less
tasks. For example, If a task has to send 1000 messages to master then its
outgoing queue will be bigger and will take more time to send as compared
to task with 500 outgoing messages. So, is my interpretation correct or
something else is going on here ?

Any insight would be helpful.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message