Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B951FCE55 for ; Wed, 6 Jun 2012 02:10:41 +0000 (UTC) Received: (qmail 95149 invoked by uid 500); 6 Jun 2012 02:10:41 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 95084 invoked by uid 500); 6 Jun 2012 02:10:41 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 95074 invoked by uid 99); 6 Jun 2012 02:10:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2012 02:10:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [140.203.201.101] (HELO mx2.nuigalway.ie) (140.203.201.101) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2012 02:10:29 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAI+7zk+My8qD/2dsb2JhbABFtT2CGAEBBAE6RAsLGB0RV4gfBQQHt0cEixODA4I2YAOVG4VQiiaCYQ X-IronPort-AV: E=Sophos;i="4.75,721,1330905600"; d="scan'208";a="26267942" Received: from vmserver66.nuigalway.ie (HELO vmit04.deri.ie) ([140.203.202.131]) by mx2.nuigalway.ie with ESMTP; 06 Jun 2012 03:10:06 +0100 Received: from [192.168.1.102] (92.251.149.5.threembb.ie [92.251.149.5]) by vmit04.deri.ie (Postfix) with ESMTPSA id 658ABC7C84 for ; Wed, 6 Jun 2012 03:10:05 +0100 (IST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Multiple jobs on same graph, aggregator use and LocalRunner issue From: Benjamin Heitmann In-Reply-To: <1338931310.2792.20.camel@clivelt2> Date: Wed, 6 Jun 2012 03:10:03 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <763BF0F6-2E00-47D3-9F80-14BC3EDF3801@deri.org> References: <1338931310.2792.20.camel@clivelt2> To: user@giraph.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org Hi Clive,=20 On 5 Jun 2012, at 22:21, Clive Cox wrote: >=20 > I recently started playing with Giraph and I have a few questions. >=20 > 1. I'm writing a simple spreading activation algorithm I am also working on a spreading activation algorithm.=20 My original data is in the form of an RDF graph, which has typed edges = and vertices,=20 which is pretty far away from the kind of pagerank algorithm for which = Google Pregel and thus Apache Giraph is optimised for.=20 So I can understand your questions very well.=20 > which would be > run many times over the same graph with different initial vertices > activated. Doing this as separate jobs in which a potentially large > graph is loaded each time will be slow. Is there a way to run multiple > BSP runs over the same loaded graph?=20 Sadly this is not possible currently AFAIK. The Hadoop paradigm is = focused on=20 on jobs with a transient graph.=20 But I think if enough people speak up to point out how ineffecient it is = to just throw away the graph between jobs,=20 maybe some sort of mechanism can be added for running the same algorithm = with different "configurations" on the same graph.=20 I need to run the same algorithm on the same graph for different user = profiles ("different configurations"),=20 and it was a big challenge to run all of those configurations in = parallel in just one run. For my case,=20 building the graph takes between 1/3 and 1/4 of the total processing = time > 2. I might want to normalise the vertex values at the end of a > superstep. I assume I can use an aggregator to get the sum of the = values > but I'm not sure where can I update all vertex values before the next > superstep? The best place right now to add some coordinating logic based on a = knowledge about the whole graph,=20 is in the WorkerContext, specifically in the pre-superstep method.=20 In the compute method of a vertex, you can add a value to a Sum/LongSum = Aggregator. Then in the pre-superstep method of the WorkerContext you can check the = value of that aggregator.=20 Then you can either re-set that same aggregator, or you can set another = aggregator. Then in the next superstep the vertices will need to check that aggregator and retrieve the new = normalised value. Somebody started to work on a patch for a centralised master which will = be able to control/coordinate the whole graph,=20 but nothing has been finished for that. The Jira issue is here: = https://issues.apache.org/jira/browse/GIRAPH-127 > 3. On a smaller trivial point: Running within a LocalRunner for > debugging I need to delete the local zookeeper state created in _bsp* > folders otherwise the next run does nothing as its assumes its the = same > state and just finishes straight away.=20 I never had that issue, so I cant comment on that.=20=