hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From andronat_asf <andronat_...@hotmail.com>
Subject Re: Dynamic vertices and hama counters
Date Tue, 16 Jul 2013 20:01:38 GMT
Thank you everyone,

+1 for Tommaso, I will see what I can do about that :)

I also believe that ZK is very similar sync() mechanism that Edward is saying, but if we need
to sync more info we might need ZK.

Thanks again,
Anastasis

On 15 Ιουλ 2013, at 5:55 μ.μ., Edward J. Yoon <edwardyoon@apache.org> wrote:

> andronat_asf,
> 
> To aggregate and broadcast the global count of updated vertices, we
> calls sync() twice. See the doAggregationUpdates() method in
> GraphJobRunner. You can solve your problem the same way, and there
> will be no additional cost.
> 
> Use of Zookeeper is not bad idea. But IMO, it's not much different
> with sync() mechanism.
> 
> On Mon, Jul 15, 2013 at 10:05 PM, Chia-Hung Lin <clin4j@googlemail.com> wrote:
>> +1 for Tommaso's solution.
>> 
>> If not every algorithm needs counter service, having an interface with
>> different implementations (in-memory, zk, etc.) should reduce the side
>> effect.
>> 
>> 
>> On 15 July 2013 15:51, Tommaso Teofili <tommaso.teofili@gmail.com> wrote:
>>> what about introducing a proper API for counting vertices, something like
>>> an interface VertexCounter with 2-3 implementations like
>>> InMemoryVertexCounter (basically the current one), a
>>> DistributedVertexCounter to implement the scenario where we use a separate
>>> BSP superstep to count them and a ZKVertexCounter which handles vertices
>>> counts as per Chian-Hung's suggestion.
>>> 
>>> Also we may introduce something like a configuration variable to define if
>>> all the vertices are needed or just the neighbors (and/or some other
>>> strategy).
>>> 
>>> My 2 cents,
>>> Tommaso
>>> 
>>> 2013/7/14 Chia-Hung Lin <clin4j@googlemail.com>
>>> 
>>>> Just my personal viewpoint. For small size of global information,
>>>> considering to store the state in ZooKeeper might be a reasonable
>>>> solution.
>>>> 
>>>> On 13 July 2013 21:28, andronat_asf <andronat_asf@hotmail.com> wrote:
>>>>> Hello everyone,
>>>>> 
>>>>> I'm working on HAMA-767 and I have some concerns on counters and
>>>> scalability. Currently, every peer has a set of vertices and a variable
>>>> that is keeping the total number of vertices through all peers. In my case,
>>>> I'm trying to add and remove vertices during the runtime of a job, which
>>>> means that I have to update all those variables.
>>>>> 
>>>>> My problem is that this is not efficient because in every operation (add
>>>> or remove a vertex) I need to update all peers, so I need to send lots of
>>>> messages to make those updates (see GraphJobRunner#countGlobalVertexCount
>>>> method) and I believe this is not correct and scalable. An other problem
is
>>>> that, even if I update all those variable (with the cost of sending lots
of
>>>> messages to every peer) those variables will be updated on the next
>>>> superstep.
>>>>> 
>>>>> e.g.:
>>>>> 
>>>>> Peer 1:                            Peer 2:
>>>>>  Vert_1                              Vert_2
>>>>> (Total_V = 2)                  (Total_V = 2)
>>>>> addVertex()
>>>>> (Total_V = 3)
>>>>>                                         getNumberOfV() => 2
>>>>> 
>>>>> ------------------------ Sync ------------------------
>>>>> 
>>>>>                                         getNumberOfV() => 3
>>>>> 
>>>>> 
>>>>> Is there something like global counters or shared memory that it can
>>>> address this issue?
>>>>> 
>>>>> P.S. I have a small feeling that we don't need to track the total amount
>>>> of vertices because vertex centered algorithms rarely need total numbers,
>>>> they only depend on neighbors (I might be wrong though).
>>>>> 
>>>>> Thanks,
>>>>> Anastasis
>>>> 
> 
> 
> 
> -- 
> Best Regards, Edward J. Yoon
> @eddieyoon
> 


Mime
View raw message