hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chia-Hung Lin <cli...@googlemail.com>
Subject Re: Dynamic vertices and hama counters
Date Thu, 18 Jul 2013 15:48:42 GMT
Sorry my bad. Only focused on counter stuff. Didn't pay attention to
Vertex related issue. Thought that just want to share counter value
between peers. In that case persisting counter value to zk shouldn't
be a problem, and won't incur overhead. But if the case is not about
counter, please just ignore my previous post.


On 17 July 2013 06:59, Edward J. Yoon <edwardyoon@apache.org> wrote:
> You guys seems totally misunderstood what I am saying.
>
> Every BSP processor accesses to ZK's counter concurrently? Do you
> think it is possible to determine the current total number of vertices
> in every step without barrier synchronization?
>
> As I mentioned before, there is already additional barrier
> synchronization steps for aggregating and broadcasting global updated
> vertex count. You can use this steps without *no additional barrier
> synchronization*.
>
> On Wed, Jul 17, 2013 at 5:01 AM, andronat_asf <andronat_asf@hotmail.com> wrote:
>> Thank you everyone,
>>
>> +1 for Tommaso, I will see what I can do about that :)
>>
>> I also believe that ZK is very similar sync() mechanism that Edward is saying, but
if we need to sync more info we might need ZK.
>>
>> Thanks again,
>> Anastasis
>>
>> On 15 Ιουλ 2013, at 5:55 μ.μ., Edward J. Yoon <edwardyoon@apache.org>
wrote:
>>
>>> andronat_asf,
>>>
>>> To aggregate and broadcast the global count of updated vertices, we
>>> calls sync() twice. See the doAggregationUpdates() method in
>>> GraphJobRunner. You can solve your problem the same way, and there
>>> will be no additional cost.
>>>
>>> Use of Zookeeper is not bad idea. But IMO, it's not much different
>>> with sync() mechanism.
>>>
>>> On Mon, Jul 15, 2013 at 10:05 PM, Chia-Hung Lin <clin4j@googlemail.com>
wrote:
>>>> +1 for Tommaso's solution.
>>>>
>>>> If not every algorithm needs counter service, having an interface with
>>>> different implementations (in-memory, zk, etc.) should reduce the side
>>>> effect.
>>>>
>>>>
>>>> On 15 July 2013 15:51, Tommaso Teofili <tommaso.teofili@gmail.com>
wrote:
>>>>> what about introducing a proper API for counting vertices, something
like
>>>>> an interface VertexCounter with 2-3 implementations like
>>>>> InMemoryVertexCounter (basically the current one), a
>>>>> DistributedVertexCounter to implement the scenario where we use a separate
>>>>> BSP superstep to count them and a ZKVertexCounter which handles vertices
>>>>> counts as per Chian-Hung's suggestion.
>>>>>
>>>>> Also we may introduce something like a configuration variable to define
if
>>>>> all the vertices are needed or just the neighbors (and/or some other
>>>>> strategy).
>>>>>
>>>>> My 2 cents,
>>>>> Tommaso
>>>>>
>>>>> 2013/7/14 Chia-Hung Lin <clin4j@googlemail.com>
>>>>>
>>>>>> Just my personal viewpoint. For small size of global information,
>>>>>> considering to store the state in ZooKeeper might be a reasonable
>>>>>> solution.
>>>>>>
>>>>>> On 13 July 2013 21:28, andronat_asf <andronat_asf@hotmail.com>
wrote:
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> I'm working on HAMA-767 and I have some concerns on counters
and
>>>>>> scalability. Currently, every peer has a set of vertices and a variable
>>>>>> that is keeping the total number of vertices through all peers. In
my case,
>>>>>> I'm trying to add and remove vertices during the runtime of a job,
which
>>>>>> means that I have to update all those variables.
>>>>>>>
>>>>>>> My problem is that this is not efficient because in every operation
(add
>>>>>> or remove a vertex) I need to update all peers, so I need to send
lots of
>>>>>> messages to make those updates (see GraphJobRunner#countGlobalVertexCount
>>>>>> method) and I believe this is not correct and scalable. An other
problem is
>>>>>> that, even if I update all those variable (with the cost of sending
lots of
>>>>>> messages to every peer) those variables will be updated on the next
>>>>>> superstep.
>>>>>>>
>>>>>>> e.g.:
>>>>>>>
>>>>>>> Peer 1:                            Peer 2:
>>>>>>>  Vert_1                              Vert_2
>>>>>>> (Total_V = 2)                  (Total_V = 2)
>>>>>>> addVertex()
>>>>>>> (Total_V = 3)
>>>>>>>                                         getNumberOfV() =>
2
>>>>>>>
>>>>>>> ------------------------ Sync ------------------------
>>>>>>>
>>>>>>>                                         getNumberOfV() =>
3
>>>>>>>
>>>>>>>
>>>>>>> Is there something like global counters or shared memory that
it can
>>>>>> address this issue?
>>>>>>>
>>>>>>> P.S. I have a small feeling that we don't need to track the total
amount
>>>>>> of vertices because vertex centered algorithms rarely need total
numbers,
>>>>>> they only depend on neighbors (I might be wrong though).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Anastasis
>>>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon

Mime
View raw message