Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 284141088C for ; Thu, 27 Feb 2014 10:44:58 +0000 (UTC) Received: (qmail 91835 invoked by uid 500); 27 Feb 2014 10:44:53 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 91735 invoked by uid 500); 27 Feb 2014 10:44:50 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 91707 invoked by uid 99); 27 Feb 2014 10:44:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Feb 2014 10:44:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mneumann@spotify.com designates 209.85.220.174 as permitted sender) Received: from [209.85.220.174] (HELO mail-vc0-f174.google.com) (209.85.220.174) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Feb 2014 10:44:42 +0000 Received: by mail-vc0-f174.google.com with SMTP id im17so2264862vcb.5 for ; Thu, 27 Feb 2014 02:44:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=MXwaX4V3+sjTUA9Qtc9pvUyD3s/5hMJplc1sNiXlfG0=; b=CHm3y7k43JRnxqsbTA+UqxQTdhLvL0Pa/cWyFnUtPFGnUqbTewjsTkXieKQj0i981n prSRZdlf9cFaCylugwD7K01kcmyfAG0VK2U38a6udU+JiLKY/VKVBynGK/zzkt249fce IeLHpGCAxzZCCBv7vhqOztYa8WABrJMKbLBJlBkH4GBQKE91Sr2ukgnS5DW//o8mrJCw CIEXbe7YuKvyos5TrJXjR6w+hchXGJBSvk+qXNEaSFNBTLFjbG7TKUi/l10ivMaiK819 EtY6w6GYhZijQe8Yx0osYLnnsZoaCEWJBP06spxMbbFwGlw2xbA4kfuNtgETgoLIIxQv ZhyA== X-Gm-Message-State: ALoCoQkKeLl594uwmP/QRwUsIFnYLAynbIwZPfuvms3JQAyOK7VQQUwVQLSCXtowUfuDwSPNpBz/ MIME-Version: 1.0 X-Received: by 10.220.109.1 with SMTP id h1mr10124590vcp.20.1393497861710; Thu, 27 Feb 2014 02:44:21 -0800 (PST) Received: by 10.58.216.72 with HTTP; Thu, 27 Feb 2014 02:44:21 -0800 (PST) In-Reply-To: <530F0C01.6050009@apache.org> References: <530F0C01.6050009@apache.org> Date: Thu, 27 Feb 2014 11:44:21 +0100 Message-ID: Subject: Re: Different num supersteps From: Martin Neumann To: user@giraph.apache.org, ssc@apache.org Content-Type: multipart/alternative; boundary=001a11c308d60afb1804f36101f0 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c308d60afb1804f36101f0 Content-Type: text/plain; charset=ISO-8859-1 The data I have as input is not in a Graph-Format so I use an EdgeInputFormat to create a Graph. Its also deterministic so the same Graph should be build with the same input. Each line in the input is a set of connected vertices. I create edges in a way that they form a star around the vertex with the lowest ID. Every vertex has an edge to that vertex and it has an edge to all other vertices. See the code further down. I will see if I can sort the output and then do a diff to see if they have the same results. Is there a simple way to make sure that the same graph was build? (The full dataset has around a billion vertices so I can't visualize it) private class ConCompInReader extends TextEdgeReader { private Text[] nodeArray = null; private int otherNodeId = -1; NullWritable edgeValue = NullWritable.get(); // flag used to create edges in both directions private boolean flipped = true; private boolean done = true; @Override public final Text getCurrentSourceId() throws IOException, InterruptedException { if (flipped) return nodeArray[otherNodeId]; return nodeArray[0]; } @Override public final Edge getCurrentEdge() throws IOException, InterruptedException { if (flipped) return EdgeFactory.create(nodeArray[0], edgeValue); return EdgeFactory.create(nodeArray[otherNodeId], edgeValue); } @Override public final boolean nextEdge() throws IOException, InterruptedException { // flipp direction of the edge if(!flipped){ flipped = true; } // line not yet done and already created the flipped version else if(!done){ otherNodeId++; flipped = false; if (otherNodeId >= nodeArray.length){ done = true; } } // TODO this version ignores singels switch with version below if needed // if done with this line read the next one if there is one // if current line is consumed it checks if there is a next one if (done) { boolean hasNext = getRecordReader().nextKeyValue(); while (done && hasNext) { String[] stringNodeArray = getRecordReader().getCurrentValue().toString().split( "\\s+"); // found line with more than one URL if (stringNodeArray.length >= 2) { otherNodeId = 1; done = false; flipped = false; Arrays.sort(stringNodeArray); nodeArray = new Text[stringNodeArray.length]; for (int i = 0; i < stringNodeArray.length; i++) { nodeArray[i] = new Text(stringNodeArray[i]); } } // if not ignore line and read a new one else { hasNext = getRecordReader().nextKeyValue(); } } } // TODO version that does not ignore singels /*if (done && getRecordReader().nextKeyValue()) { arrayNode = getRecordReader().getCurrentValue(); String[] stringNodeArray = arrayNode.toString().split("\\s+"); otherNodeId = 1; done = false; flipped = false; Arrays.sort(stringNodeArray); nodeArray = new Text[stringNodeArray.length]; for (int i = 0; i < stringNodeArray.length; i++) { nodeArray[i] = new Text(stringNodeArray[i]); } }*/ return !done; } } On Thu, Feb 27, 2014 at 10:57 AM, Sebastian Schelter wrote: > Hi Martin, > > You are right, this should not happen, your code looks correct. One way to > check the output would be to simply count the number vertices per component > and see if that number stays stable. > > Do you supply all vertices in your input data or are some vertices created > during the computation? Maybe there's a bug/race condition with the vertex > creation (just wildly guessing) > > Best, > Sebastian > > > > On 02/27/2014 10:48 AM, Martin Neumann wrote: > >> Hej, >> >> I have modified the connected component example to fit my input data. I >> expect it to be deterministic. >> >> But when I run it multiple times it takes a different number of Super >> steps. This only happens on the complete dataset and not on my small test >> dataset. >> (So I cannot check the output for correctness in a simple way) >> >> Has anyone an Idea how this could happen? >> >> cheers Martin >> >> >> >> >> In case its useful here the computation class: >> >> @Override >> public void compute(Vertex vertex, >> Iterable inmessage) throws IOException { >> boolean changed = false; >> >> // first superstep && setup >> if (getSuperstep() == 0) { >> //initialize value >> vertex.setValue(vertex.getId()); >> //cheating by checking the neighbors ID's (cuts down 1 >> iteration) >> for (Edge e : vertex.getEdges()) { >> Text candidate = e.getTargetVertexId(); >> if (candidate.compareTo(vertex.getValue()) < 0) { >> changed = true; >> vertex.setValue(candidate); >> } >> } >> } >> >> // other superstep >> else { >> // read all messages and compare with own state >> for (Text message : inmessage) { >> if (message.compareTo(vertex.getValue()) < 0) { >> changed = true; >> vertex.setValue(message); >> } >> } >> } >> >> // if state has changed send a message to all neighbors >> if (changed && getSuperstep() < limiter) { >> sendMessageToAllEdges(vertex, vertex.getValue()); >> } >> >> vertex.voteToHalt(); >> } >> >> > --001a11c308d60afb1804f36101f0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
The data I have as input is not in a Graph-Format so = I use an EdgeInputFormat to create a Graph. Its also deterministic so the s= ame Graph should be build with the same input.

Each line = in the input is a set of connected vertices.
I create edges in a way that they form a star around the vertex with the lo= west ID. Every vertex has an edge to that vertex and it has an edge to all = other vertices. See the code further down.

I will see if = I can sort the output and then do a diff to see if they have the same resul= ts. Is there a simple way to make sure that the same graph was build? (The = full dataset has around a billion vertices so I can't visualize it)



private class ConCompInReader extends TextEdge= Reader {
=A0=A0=A0 =A0=A0=A0 private Text[] nodeArray =3D null;
=A0= =A0=A0 =A0=A0=A0 private int otherNodeId =3D -1;
=A0=A0=A0 =A0=A0=A0 Nul= lWritable edgeValue =3D NullWritable.get();
=A0=A0=A0 =A0=A0=A0 // flag used to create edges in both directions
=A0= =A0=A0 =A0=A0=A0 private boolean flipped =3D true;
=A0=A0=A0 =A0=A0=A0 p= rivate boolean done =3D true;

=A0=A0=A0 =A0=A0=A0 @Override
=A0= =A0=A0 =A0=A0=A0 public final Text getCurrentSourceId() throws IOException,=
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 InterruptedException {
=A0=A0=A0= =A0=A0=A0 =A0=A0=A0 if (flipped)
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0= =A0 return nodeArray[otherNodeId];
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return = nodeArray[0];
=A0=A0=A0 =A0=A0=A0 }

=A0=A0=A0 =A0=A0=A0 @Override=
=A0=A0=A0 =A0=A0=A0 public final Edge<Text, NullWritable> getCurr= entEdge()
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 throws IOException, InterruptedExce= ption {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (flipped)
=A0=A0=A0 =A0=A0= =A0 =A0=A0=A0 =A0=A0=A0 return EdgeFactory.create(nodeArray[0], edgeValue);=
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return EdgeFactory.create(nodeArray[other= NodeId], edgeValue);
=A0=A0=A0 =A0=A0=A0 }

=A0=A0=A0 =A0=A0=A0 @Override
=A0=A0=A0 =A0= =A0=A0 public final boolean nextEdge() throws IOException,
=A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 InterruptedException {
=A0=A0=A0 =A0=A0=A0 = =A0=A0=A0
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // flipp direction of the edge<= br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if(!flipped){
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 flipped =3D true;
=A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0
=A0=A0=A0 =A0=A0= =A0 =A0=A0=A0 // line not yet done and already created the flipped version<= br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 else if(!done){
=A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 otherNodeId++;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = flipped =3D false;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (otherNodeId >=3D nodeArray.l= ength){
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 done =3D true;=
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 }

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // TODO this version ignores = singels switch with version below if needed
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // if done with this line read the next one i= f there is one
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // if current line is consu= med it checks if there is a next one
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (d= one) {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 boolean hasNext =3D getRe= cordReader().nextKeyValue();
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 while (done && hasNext) {=A0=A0=A0
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String[] = stringNodeArray =3D getRecordReader().getCurrentValue().toString().split(=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 &qu= ot;\\s+");

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 //= found line with more than one URL
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (stringNodeArray.lengt= h >=3D 2) {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0= =A0 otherNodeId =3D 1;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0= =A0=A0=A0 done =3D false;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 flipped =3D false;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0= =A0 =A0=A0=A0 =A0=A0=A0 Arrays.sort(stringNodeArray);
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 nodeArray =3D n= ew Text[stringNodeArray.length];
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0= =A0=A0=A0 =A0=A0=A0 for (int i =3D 0; i < stringNodeArray.length; i++) = {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = nodeArray[i] =3D new Text(stringNodeArray[i]);
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 // if not ignore line and read a new one
=A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 else {
=A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 hasNext =3D getRecordReader().nextKeyV= alue();
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = }

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // TODO version that does not ignore= singels
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 /*if (done && getRecordRe= ader().nextKeyValue()) {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 arrayNo= de =3D getRecordReader().getCurrentValue();
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String[] stringNodeArray =3D arrayN= ode.toString().split("\\s+");

=A0=A0=A0 =A0=A0=A0 =A0=A0= =A0 =A0=A0=A0 otherNodeId =3D 1;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0= done =3D false;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 flipped =3D fal= se;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Arrays.sort(stringNodeArray)= ;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 nodeArray =3D new Text[stringNodeAr= ray.length];
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 for (int i =3D 0; i= < stringNodeArray.length; i++) {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 nodeArray[i] =3D new Text(stringNodeArray[i]);
=A0=A0= =A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }*/

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = return !done;
=A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 }


On Thu, Feb 27, 2014 at= 10:57 AM, Sebastian Schelter <ssc@apache.org> wrote:
Hi Martin,

You are right, this should not happen, your code looks correct. One way to = check the output would be to simply count the number vertices per component= and see if that number stays stable.

Do you supply all vertices in your input data or are some vertices created = during the computation? Maybe there's a bug/race condition with the ver= tex creation (just wildly guessing)

Best,
Sebastian



On 02/27/2014 10:48 AM, Martin Neumann wrote:
Hej,

I have modified the connected component example to fit my input data. I
expect it to be deterministic.

But when I run it multiple times it takes a different number of Super
steps. This only happens on the complete dataset and not on my small test dataset.
=A0 (So I cannot check the output for correctness in a simple way)

Has anyone an Idea how this could happen?

cheers Martin




In case its useful here the computation class:

@Override
=A0 =A0 =A0public void compute(Vertex<Text, Text, NullWritable> verte= x,
=A0 =A0 =A0 =A0 =A0 =A0 =A0Iterable<Text> inmessage) throws IOExcepti= on {
=A0 =A0 =A0 =A0 =A0boolean changed =3D false;

// first superstep && setup
=A0 =A0 =A0 =A0 =A0if (getSuperstep() =3D=3D 0) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0//initialize value
=A0 =A0 =A0 =A0 =A0 =A0 =A0vertex.setValue(vertex.getId());
=A0 =A0 =A0 =A0 =A0 =A0 =A0//cheating by checking the neighbors ID's (c= uts down 1
iteration)
=A0 =A0 =A0 =A0 =A0 =A0 =A0for (Edge<Text, NullWritable> e : vertex.g= etEdges()) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Text candidate =3D e.getTargetVertexId()= ;
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (candidate.compareTo(vertex.ge= tValue()) < 0) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0changed =3D true;
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vertex.setValue(candidate);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
=A0 =A0 =A0 =A0 =A0 =A0 =A0}
=A0 =A0 =A0 =A0 =A0}

=A0 =A0 =A0 =A0 =A0// other superstep
=A0 =A0 =A0 =A0 =A0else {
=A0 =A0 =A0 =A0 =A0 =A0 =A0// read all messages and compare with own state<= br> =A0 =A0 =A0 =A0 =A0 =A0 =A0for (Text message : inmessage) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (message.compareTo(vertex.getV= alue()) < 0) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0changed =3D true;
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vertex.setValue(message);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
=A0 =A0 =A0 =A0 =A0 =A0 =A0}
=A0 =A0 =A0 =A0 =A0}

=A0 =A0 =A0 =A0 =A0// if state has changed send a message to all neighbors<= br> =A0 =A0 =A0 =A0 =A0if (changed && getSuperstep() < limiter) { =A0 =A0 =A0 =A0 =A0 =A0 =A0sendMessageToAllEdges(vertex, vertex.getValue())= ;
=A0 =A0 =A0 =A0 =A0}

=A0 =A0 =A0 =A0 =A0vertex.voteToHalt();
=A0 =A0 =A0}



--001a11c308d60afb1804f36101f0--