Return-Path: X-Original-To: apmail-incubator-hama-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-hama-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58F63C20A for ; Fri, 27 Apr 2012 15:24:41 +0000 (UTC) Received: (qmail 6808 invoked by uid 500); 27 Apr 2012 15:24:41 -0000 Delivered-To: apmail-incubator-hama-user-archive@incubator.apache.org Received: (qmail 6782 invoked by uid 500); 27 Apr 2012 15:24:41 -0000 Mailing-List: contact hama-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hama-user@incubator.apache.org Delivered-To: mailing list hama-user@incubator.apache.org Received: (qmail 6772 invoked by uid 99); 27 Apr 2012 15:24:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Apr 2012 15:24:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of thomas.jungblut@googlemail.com designates 209.85.212.47 as permitted sender) Received: from [209.85.212.47] (HELO mail-vb0-f47.google.com) (209.85.212.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Apr 2012 15:24:35 +0000 Received: by vbbfr13 with SMTP id fr13so665896vbb.6 for ; Fri, 27 Apr 2012 08:24:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=zX3GJpcRwrNqUZXiQLZriY1l4+9BlgP4ieK0Icf8Kcc=; b=wFjVxEl8yVFBR8BAXdqjmNKgRgdjH7n7uyqCzIWIpAhyZY9+l79riT6A16CMX5Tj/k PhKAFvn1Gw8OrB6wwyCx57PkaZ7TJEJXGZNBarI7/Xfq5KmFNsKulCF8JWc2OR5913NB nhLv0fXLO87Gr2g3Hqhg0cokijL1fybQoD1DlObSMue9ew+RnvCEmdF0VGixP89hfemy kEHHVaufuLT7F8uDVTLjCWSVkacf47CruEbUltyxiGCRDnM4Qpud3JtrgsIc23RbvB2Z kZEJ/XLQiRRbOQHUxOezXxCBbIE8TRBeiWGi5hKcz4itTXe4iTeN2hzS4UZIyMB0KKwp QK1g== MIME-Version: 1.0 Received: by 10.52.15.233 with SMTP id a9mr10388038vdd.34.1335540253940; Fri, 27 Apr 2012 08:24:13 -0700 (PDT) Received: by 10.220.215.3 with HTTP; Fri, 27 Apr 2012 08:24:13 -0700 (PDT) In-Reply-To: References: <4F9AB3B0.4000608@unister-gmbh.de> Date: Fri, 27 Apr 2012 17:24:13 +0200 Message-ID: Subject: Re: How to check whether the target node of an edge exists? From: Thomas Jungblut To: hama-user@incubator.apache.org Content-Type: multipart/alternative; boundary=20cf302d4dd46b5b0c04beaab264 --20cf302d4dd46b5b0c04beaab264 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Oh sorry I have not read the other half of your mail. I have made a mapreduce preprocessing step (yes mapreduce is the right answer for that here) for that which can be found here: https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/= jungblut/crawl/processing/WebGraphProcessingJob.java > It traverses the graph and the reducer makes the right output which can be processed by the job or the TextToSeq utility as a preprocessing for pagerank, just have a look into the example package. I appears that I am not supposed to access the vertices field of > GraphJobRunner class in some way from within the PageRank.PageRankVertex > class? > Yes in this new Pregel-Like API this falls under information-hiding. If you have a look into the 4.0 release, there is the hardcore version of pagerank with plain BSP, there you can access and modify all the stuff you want. (but it is more complicated) Hope it helps, if you have additional problems, don't hesitate to ask them. Am 27. April 2012 17:10 schrieb Thomas Jungblut < thomas.jungblut@googlemail.com>: > You have to "address" dangling nodes on your adjacency list. > > So your input must look like: > > > 0 1 2 > 1 1 2 > 2 1 2 3 > 3 <-- this one was missing causing the Null Pointer Exception. > 5 > > See http://wiki.apache.org/hama/PageRank under "Submit your own Webgraph"= . > >> This piece of text will adjacent Site1 to Site2 and Site3, Site2 to Site= 3 >> and Site3 is a dangling node. As you can see a site is always on the >> leftmost side (we call it the key-site), and the outlinks are seperated = by >> tabs (\t) as the following elements. >> Make sure that every site's outlink can somewhere be found in the file a= s >> a key-site. Otherwise it will result in weird NullPointerExceptions. >> > > > Good luck. > > Am 27. April 2012 16:56 schrieb SWP : > > I am dealing with the PageRank example >> from hama-dist-0.5.0-incubating-**source.tar.gz RC2 >> which I downloaded from http://people.apache.org/~**edwardyoon/dist/ >> a few days ago. >> >> My input graph has some "dangling edges", that is, edges pointing to >> non-existing nodes. >> Here are the adjacencies of a small example. The format is: >> source target1 target2 target3 ... >> >> 0 1 2 >> 1 1 2 >> 2 1 2 3 >> 5 >> >> Your see that 2 has an edge directed to 3 but there is no adjacency list >> given for 3. >> >> Now, when I run this example through pagerank-text2seq and then the >> pagerank examle, I get a NullPointerException: >> >> 12/04/27 16:15:17 ERROR bsp.LocalBSPRunner: Exception during BSP >> execution! >> java.lang.NullPointerException >> at org.apache.hama.graph.**GraphJobRunner.bsp(** >> GraphJobRunner.java:96) >> at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.run(** >> LocalBSPRunner.java:256) >> at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(** >> LocalBSPRunner.java:286) >> at org.apache.hama.bsp.**LocalBSPRunner$BSPRunner.call(** >> LocalBSPRunner.java:1) >> at java.util.concurrent.**FutureTask$Sync.innerRun(** >> FutureTask.java:303) >> at java.util.concurrent.**FutureTask.run(FutureTask.**java:138) >> at java.util.concurrent.**ThreadPoolExecutor$Worker.** >> runTask(ThreadPoolExecutor.**java:886) >> at java.util.concurrent.**ThreadPoolExecutor$Worker.run(** >> ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.**java:662) >> >> The problem appears to be that when GraphJobRunner's bsp() method looks >> up the vertex to which the message is addressed, >> it is not found in the vertices map. >> (By the way, if you replace 5 with 3 in the example, it works - because >> then the target vertex can be looked up.) >> >> See the vertices.get(e.getKey()) statement in the code snippet below. >> Of course one can avoid the exception by adding a check in >> GraphJobRunner.java (at line about 95) like this: >> >> if(vertices.containsKey(e.**getKey())) >> { >> vertices.get(e.getKey()).**compute(msgs.iterator()); >> } else { >> System.out.println("Ignoring message(s) '" + msgs + "' sent t= o >> vertex '" + e.getKey() +"'"); >> } >> >> However, what I really want is: >> check within PageRank.PageRankVertex's compute() method whether the >> target vertex exists >> before sending out a message to it. >> >> That is, in PageRank.java (line 60) , instead of >> >> sendMessageToNeighbors(new DoubleWritable(this.getValue()= * >> *.get() / numEdges)); >> >> I would like to send messages only to "existing" vertices, that is, >> those which have an adjacency list in the input. >> >> Any hints how this can be achieved? >> I appears that I am not supposed to access the vertices field of >> GraphJobRunner class in some way from within the PageRank.PageRankVertex >> class? >> >> I concede that my example graph may qualify as invalid input ... but on >> the other hand: how could I add those missing vertices after a first pas= s >> through the adjacency lists input? >> >> Clemens Gr=F6pl >> >> -- >> >> Semantic Web Project, IT >> >> Unister GmbH >> Barfu=DFg=E4=DFchen 11 | 04109 Leipzig >> >> Telefon: +49 (0)341 49288 4496 >> contact-semweb@unister-gmbh.de > unister-gmbh.de <20contact-semweb@unister-gmbh.de>> >> www.unister.de >> >> Vertretungsberechtigter Gesch=E4ftsf=FChrer: Thomas Wagner >> Amtsgericht Leipzig, HRB: 19056 >> >> > > > -- > Thomas Jungblut > Berlin > --=20 Thomas Jungblut Berlin --20cf302d4dd46b5b0c04beaab264--