Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6EA10CD7F for ; Mon, 28 May 2012 20:54:39 +0000 (UTC) Received: (qmail 3080 invoked by uid 500); 28 May 2012 20:54:39 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 3026 invoked by uid 500); 28 May 2012 20:54:39 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 3018 invoked by uid 99); 28 May 2012 20:54:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 May 2012 20:54:39 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of castagna.lists@googlemail.com designates 74.125.82.54 as permitted sender) Received: from [74.125.82.54] (HELO mail-wg0-f54.google.com) (74.125.82.54) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 May 2012 20:54:33 +0000 Received: by wgbfg15 with SMTP id fg15so2700178wgb.11 for ; Mon, 28 May 2012 13:54:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=/tXTZ05gr3zBPG8mG9IUdGHFAeKFr+iUp4ideH1FSXw=; b=tTYEcFSwi17NQt12ZXJP3itozPWA5mtAykdePs2V6/P4uRpf19tTtLqkrT0amcy/Q6 Eej3umTbQ2QhzYtEFed86rknjFqxcMclWG96YUX8K++IbTNqPTWPLFjVrEktBLQVmTbk rfXkKb+5wZ/KNY/xtXzBI7NnhoHR4/IekJCpY6BqUWKkeiH7nRjrayqvOlePB55ZOa7L 98GnyGnFuVmgdOQmDY+5xtedPGbN/duwsmuay3q8E+nAJe25Cc1Zvoi171CxL3HoOot3 3xZ/kabDlJTyMJUKsMU2n2rIha6FkmkLU1Yhj/5zVwGWwfqXjEN79ZQKvCgD9FLxdx31 J9Qg== Received: by 10.216.193.166 with SMTP id k38mr5455283wen.200.1338238451249; Mon, 28 May 2012 13:54:11 -0700 (PDT) Received: from [192.168.2.10] (79-66-222-116.dynamic.dsl.as9105.com. [79.66.222.116]) by mx.google.com with ESMTPS id gv7sm23541585wib.4.2012.05.28.13.54.09 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 28 May 2012 13:54:10 -0700 (PDT) Message-ID: <4FC3E5EF.3070707@googlemail.com> Date: Mon, 28 May 2012 21:54:07 +0100 From: Paolo Castagna User-Agent: Thunderbird 2.0.0.24 (X11/20101027) MIME-Version: 1.0 To: user@giraph.apache.org Subject: Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes... References: <4FB509F4.4040407@googlemail.com> <4FB52A7A.7030601@apache.org> <4FB5713B.2080504@googlemail.com> <4FB5758F.2060203@apache.org> <4FB580BC.9040307@googlemail.com> <4FB624F6.5030306@googlemail.com> <4FB627A8.6070307@apache.org> <4FB62ACA.9070508@googlemail.com> <4FC3AA2B.5080706@googlemail.com> <4FC3BC1E.3070607@apache.org> <4FC3D017.5090206@googlemail.com> <4FC3D272.9070501@apache.org> <4FC3DC63.4020307@googlemail.com> <4FC3DDA6.7080404@apache.org> In-Reply-To: <4FC3DDA6.7080404@apache.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Sebastian Schelter wrote: > However, the problem with this input is that the dangling vertices that > don't have a line of their own (such as 11) cannot contribute their > accumulated rank, as no vertex for them will be instantiated. So > counting them doesn't help either. No, the 'implicit' dangling nodes (such as 6, 7, 9 and 11 below) are instantiated when you send a message to them. If you run the example, you'll see that after the first superstep there are 11 vertices which are sending and receiving messages (as it should be with correct input). > I think that we should rely on users supplying valid input (a line for > each vertex) and not try to correct for that in the vertex class. Well, I don't disagree in principle. But in practice this won't stop users making mistakes and provide your software with bad data as input. :-) One superstep for cleaning/validating the input data isn't that bad after all. > Creating a line for each vertex from such a file is an easy task that is > doable with a single MapReduce pass over the data beforehand. Sure. (Why is this better than a superstep with Giraph?) ;-) Paolo