hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chia-Hung Lin <cli...@googlemail.com>
Subject Re: Dynamic vertices and hama counters
Date Mon, 15 Jul 2013 13:05:12 GMT
+1 for Tommaso's solution.

If not every algorithm needs counter service, having an interface with
different implementations (in-memory, zk, etc.) should reduce the side
effect.


On 15 July 2013 15:51, Tommaso Teofili <tommaso.teofili@gmail.com> wrote:
> what about introducing a proper API for counting vertices, something like
> an interface VertexCounter with 2-3 implementations like
> InMemoryVertexCounter (basically the current one), a
> DistributedVertexCounter to implement the scenario where we use a separate
> BSP superstep to count them and a ZKVertexCounter which handles vertices
> counts as per Chian-Hung's suggestion.
>
> Also we may introduce something like a configuration variable to define if
> all the vertices are needed or just the neighbors (and/or some other
> strategy).
>
> My 2 cents,
> Tommaso
>
> 2013/7/14 Chia-Hung Lin <clin4j@googlemail.com>
>
>> Just my personal viewpoint. For small size of global information,
>> considering to store the state in ZooKeeper might be a reasonable
>> solution.
>>
>> On 13 July 2013 21:28, andronat_asf <andronat_asf@hotmail.com> wrote:
>> > Hello everyone,
>> >
>> > I'm working on HAMA-767 and I have some concerns on counters and
>> scalability. Currently, every peer has a set of vertices and a variable
>> that is keeping the total number of vertices through all peers. In my case,
>> I'm trying to add and remove vertices during the runtime of a job, which
>> means that I have to update all those variables.
>> >
>> > My problem is that this is not efficient because in every operation (add
>> or remove a vertex) I need to update all peers, so I need to send lots of
>> messages to make those updates (see GraphJobRunner#countGlobalVertexCount
>> method) and I believe this is not correct and scalable. An other problem is
>> that, even if I update all those variable (with the cost of sending lots of
>> messages to every peer) those variables will be updated on the next
>> superstep.
>> >
>> > e.g.:
>> >
>> > Peer 1:                            Peer 2:
>> >   Vert_1                              Vert_2
>> > (Total_V = 2)                  (Total_V = 2)
>> > addVertex()
>> > (Total_V = 3)
>> >                                          getNumberOfV() => 2
>> >
>> > ------------------------ Sync ------------------------
>> >
>> >                                          getNumberOfV() => 3
>> >
>> >
>> > Is there something like global counters or shared memory that it can
>> address this issue?
>> >
>> > P.S. I have a small feeling that we don't need to track the total amount
>> of vertices because vertex centered algorithms rarely need total numbers,
>> they only depend on neighbors (I might be wrong though).
>> >
>> > Thanks,
>> > Anastasis
>>

Mime
View raw message