hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Dynamic vertices and hama counters
Date Tue, 16 Jul 2013 22:59:29 GMT
You guys seems totally misunderstood what I am saying.

Every BSP processor accesses to ZK's counter concurrently? Do you
think it is possible to determine the current total number of vertices
in every step without barrier synchronization?

As I mentioned before, there is already additional barrier
synchronization steps for aggregating and broadcasting global updated
vertex count. You can use this steps without *no additional barrier
synchronization*.

On Wed, Jul 17, 2013 at 5:01 AM, andronat_asf <andronat_asf@hotmail.com> wrote:
> Thank you everyone,
>
> +1 for Tommaso, I will see what I can do about that :)
>
> I also believe that ZK is very similar sync() mechanism that Edward is saying, but if
we need to sync more info we might need ZK.
>
> Thanks again,
> Anastasis
>
> On 15 Ιουλ 2013, at 5:55 μ.μ., Edward J. Yoon <edwardyoon@apache.org> wrote:
>
>> andronat_asf,
>>
>> To aggregate and broadcast the global count of updated vertices, we
>> calls sync() twice. See the doAggregationUpdates() method in
>> GraphJobRunner. You can solve your problem the same way, and there
>> will be no additional cost.
>>
>> Use of Zookeeper is not bad idea. But IMO, it's not much different
>> with sync() mechanism.
>>
>> On Mon, Jul 15, 2013 at 10:05 PM, Chia-Hung Lin <clin4j@googlemail.com> wrote:
>>> +1 for Tommaso's solution.
>>>
>>> If not every algorithm needs counter service, having an interface with
>>> different implementations (in-memory, zk, etc.) should reduce the side
>>> effect.
>>>
>>>
>>> On 15 July 2013 15:51, Tommaso Teofili <tommaso.teofili@gmail.com> wrote:
>>>> what about introducing a proper API for counting vertices, something like
>>>> an interface VertexCounter with 2-3 implementations like
>>>> InMemoryVertexCounter (basically the current one), a
>>>> DistributedVertexCounter to implement the scenario where we use a separate
>>>> BSP superstep to count them and a ZKVertexCounter which handles vertices
>>>> counts as per Chian-Hung's suggestion.
>>>>
>>>> Also we may introduce something like a configuration variable to define if
>>>> all the vertices are needed or just the neighbors (and/or some other
>>>> strategy).
>>>>
>>>> My 2 cents,
>>>> Tommaso
>>>>
>>>> 2013/7/14 Chia-Hung Lin <clin4j@googlemail.com>
>>>>
>>>>> Just my personal viewpoint. For small size of global information,
>>>>> considering to store the state in ZooKeeper might be a reasonable
>>>>> solution.
>>>>>
>>>>> On 13 July 2013 21:28, andronat_asf <andronat_asf@hotmail.com>
wrote:
>>>>>> Hello everyone,
>>>>>>
>>>>>> I'm working on HAMA-767 and I have some concerns on counters and
>>>>> scalability. Currently, every peer has a set of vertices and a variable
>>>>> that is keeping the total number of vertices through all peers. In my
case,
>>>>> I'm trying to add and remove vertices during the runtime of a job, which
>>>>> means that I have to update all those variables.
>>>>>>
>>>>>> My problem is that this is not efficient because in every operation
(add
>>>>> or remove a vertex) I need to update all peers, so I need to send lots
of
>>>>> messages to make those updates (see GraphJobRunner#countGlobalVertexCount
>>>>> method) and I believe this is not correct and scalable. An other problem
is
>>>>> that, even if I update all those variable (with the cost of sending lots
of
>>>>> messages to every peer) those variables will be updated on the next
>>>>> superstep.
>>>>>>
>>>>>> e.g.:
>>>>>>
>>>>>> Peer 1:                            Peer 2:
>>>>>>  Vert_1                              Vert_2
>>>>>> (Total_V = 2)                  (Total_V = 2)
>>>>>> addVertex()
>>>>>> (Total_V = 3)
>>>>>>                                         getNumberOfV() => 2
>>>>>>
>>>>>> ------------------------ Sync ------------------------
>>>>>>
>>>>>>                                         getNumberOfV() => 3
>>>>>>
>>>>>>
>>>>>> Is there something like global counters or shared memory that it
can
>>>>> address this issue?
>>>>>>
>>>>>> P.S. I have a small feeling that we don't need to track the total
amount
>>>>> of vertices because vertex centered algorithms rarely need total numbers,
>>>>> they only depend on neighbors (I might be wrong though).
>>>>>>
>>>>>> Thanks,
>>>>>> Anastasis
>>>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message