Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CF10A10E4E for ; Mon, 15 Jul 2013 07:53:06 +0000 (UTC) Received: (qmail 62503 invoked by uid 500); 15 Jul 2013 07:53:06 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 62395 invoked by uid 500); 15 Jul 2013 07:53:06 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 62386 invoked by uid 99); 15 Jul 2013 07:53:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jul 2013 07:53:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tommaso.teofili@gmail.com designates 209.85.220.48 as permitted sender) Received: from [209.85.220.48] (HELO mail-pa0-f48.google.com) (209.85.220.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jul 2013 07:52:59 +0000 Received: by mail-pa0-f48.google.com with SMTP id kp12so10833321pab.7 for ; Mon, 15 Jul 2013 00:52:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=8LzxBhBxEZSloFQOijLWycC9bioIZYGD11r18G0pKOk=; b=ZfZ1cQgmNm2ARFxJeTr//IOcAHllkcnSs5Yoa12rRkH/Ti5fmG4HZebGIfHTJsW59D Of+O8cLU2SJh9po0viR4SCQYvWcsmQrUmg3FrERsGpinm8dcPi1u5Zeh1ynRLpJtC2U1 ZneN8IMFARsvQqgCyR6OzNocT30dYbEuah+xhXJ3KBiBcStgkbjn+lGJvxmdSsGBKO9L CcSwmI4bTJSHIUVb7+1STICsX36yyfdBYOCmqWay4dWisSHXwBXSkqWXQNFxjFBg5cQm 3ST8an2v+cX2yDMRVDcZnPrP0T47VpIw9CuJUQx7cX2t2KSh7rqwdv7ALKX230gkHQMK LBwQ== X-Received: by 10.66.149.198 with SMTP id uc6mr54153461pab.61.1373874759659; Mon, 15 Jul 2013 00:52:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.110.131 with HTTP; Mon, 15 Jul 2013 00:51:59 -0700 (PDT) In-Reply-To: References: From: Tommaso Teofili Date: Mon, 15 Jul 2013 09:51:59 +0200 Message-ID: Subject: Re: Dynamic vertices and hama counters To: dev@hama.apache.org Content-Type: multipart/alternative; boundary=047d7b6da8c203f59104e188251a X-Virus-Checked: Checked by ClamAV on apache.org --047d7b6da8c203f59104e188251a Content-Type: text/plain; charset=ISO-8859-1 what about introducing a proper API for counting vertices, something like an interface VertexCounter with 2-3 implementations like InMemoryVertexCounter (basically the current one), a DistributedVertexCounter to implement the scenario where we use a separate BSP superstep to count them and a ZKVertexCounter which handles vertices counts as per Chian-Hung's suggestion. Also we may introduce something like a configuration variable to define if all the vertices are needed or just the neighbors (and/or some other strategy). My 2 cents, Tommaso 2013/7/14 Chia-Hung Lin > Just my personal viewpoint. For small size of global information, > considering to store the state in ZooKeeper might be a reasonable > solution. > > On 13 July 2013 21:28, andronat_asf wrote: > > Hello everyone, > > > > I'm working on HAMA-767 and I have some concerns on counters and > scalability. Currently, every peer has a set of vertices and a variable > that is keeping the total number of vertices through all peers. In my case, > I'm trying to add and remove vertices during the runtime of a job, which > means that I have to update all those variables. > > > > My problem is that this is not efficient because in every operation (add > or remove a vertex) I need to update all peers, so I need to send lots of > messages to make those updates (see GraphJobRunner#countGlobalVertexCount > method) and I believe this is not correct and scalable. An other problem is > that, even if I update all those variable (with the cost of sending lots of > messages to every peer) those variables will be updated on the next > superstep. > > > > e.g.: > > > > Peer 1: Peer 2: > > Vert_1 Vert_2 > > (Total_V = 2) (Total_V = 2) > > addVertex() > > (Total_V = 3) > > getNumberOfV() => 2 > > > > ------------------------ Sync ------------------------ > > > > getNumberOfV() => 3 > > > > > > Is there something like global counters or shared memory that it can > address this issue? > > > > P.S. I have a small feeling that we don't need to track the total amount > of vertices because vertex centered algorithms rarely need total numbers, > they only depend on neighbors (I might be wrong though). > > > > Thanks, > > Anastasis > --047d7b6da8c203f59104e188251a--