hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Behroz Sikander <behro...@gmail.com>
Subject Re: Question regarding Hama synchronization behavior and GSOC
Date Tue, 19 Jan 2016 08:24:56 GMT
Hi,

*> Q1: Is Hama going to participate in GSOC 2016 ? *
*Sure, why not?*

-->Great. I am willing to participate in this GSOC. Do we already have some
potential projects ? Jira does not seem to have any.










*>> Q2: In the image below, I see an interesting behavior of Hama but I am
not sure why the behavior is like this. Can you tell us what version you
used? I roughly guess master task can receive incoming message bundles
concurrently if number of tasks is large.*
--> I am using 0.7.0.
Ok but can a slave send concurrent message to master if the queue is
large ? because
it seems that if the outgoing queue is large on slaves then they will take
more time.

Regards,
Behroz

On Tue, Jan 19, 2016 at 1:59 AM, Edward J. Yoon <edward.yoon@samsung.com>
wrote:

> > Q1: Is Hama going to participate in GSOC 2016 ?
>
> Sure, why not?
>
> > Q2: In the image below, I see an interesting behavior of Hama but I am
> not
> sure why the behavior is like this.
>
> Can you tell us what version you used?
>
> I roughly guess master task can receive incoming message bundles
> concurrently
> if number of tasks is large.
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:bsikander@apache.org]
> Sent: Tuesday, January 19, 2016 12:28 AM
> To: dev@hama.apache.org
> Subject: Question regarding Hama synchronization behavior and GSOC
>
> Hi,
> I have 2 questions regarding Hama.
>
> Q1: Is Hama going to participate in GSOC 2016 ?
>
> Q2: In the image below, I see an interesting behavior of Hama but I am not
> sure why the behavior is like this.
>
> http://imgur.com/cVsfL1x
>
> On x-axis, I have the total number of data that I need to process. On
> y-axis, I have the time in minutes which is aggregated over 200 iterations.
> Each line in plot represent different number of Hama tasks (Peers) used to
> process the data. Overall this plot is showing the *total time that master
> task waits for slave tasks to synchronize (*for* 200 iterations *in*
> minutes).*
>
> Note:
> 1) total time master waits for slaves in *1* *iteration* = (time of slave
> processing) +
> *(time of synchronization)*
> The plot is only showing the *time in synchronization* aggregated over *200
> iterations*. I am using this plot to study the time taken by Hama in
> synchronization.
>
> 2) The total data is divided among all the tasks equally. For example, if I
> am using 10 tasks to process 10K data, then each task will get 1000. If i
> use 20 tasks to process 10K, then each will have 500.
>
> Now in the plot for example, blue line represents 10 tasks. If I process
> 10,000 files in 200 iterations the master waits for almost 3 minutes for
> slaves to synchronize.
>
> Now if you look closely, then if I *increase* the *number of tasks* to
> process the data, the *time* of master waiting for *slaves to
> synchronization* starts to *decrease*. For example, look at the points on
> 50K data, for 30 tasks master waits for ~10 minutes, for 40 tasks it waits
> for only ~6 minutes and for 50 tasks, it took ~4mins.
>
> Q: My question is that how to interpret this information ?
> The answer that I came up is that the *outgoing message queue* of tasks is
> smaller in case I use more tasks to process and bigger in case I have less
> tasks. For example, If a task has to send 1000 messages to master then its
> outgoing queue will be bigger and will take more time to send as compared
> to task with 500 outgoing messages. So, is my interpretation correct or
> something else is going on here ?Any insight would be helpful.
>
> Regards,
> Behroz
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message