giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Vesse <rve...@dotnetrdf.org>
Subject Re: duplicate edges created with TextVertexInputFormat
Date Wed, 29 Jan 2014 17:44:13 GMT
The logs appear to show that you get two identical input slits:

14/01/28 11:02:41 INFO worker.InputSplitsCallable: getInputSplit: Reserved
/_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/
0 from ZooKeeper and got input split
'hdfs://arcus1.silverdale.dev/tmp/louvain-giraph-example/1390935731/input/sm
all:0+172'

14/01/28 11:02:42 INFO worker.InputSplitsCallable: getInputSplit: Reserved
/_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/
1 from ZooKeeper and got input split
'hdfs://arcus1.silverdale.dev/tmp/louvain-giraph-example/1390935731/input/sm
all:0+172'

Have you by any chance accidentally passed in the input file twice?

Rob

From:  Eric Kimbrel <lekimbrel@gmail.com>
Reply-To:  <user@giraph.apache.org>
Date:  Wednesday, 29 January 2014 09:08
To:  <user@giraph.apache.org>
Subject:  duplicate edges created with TextVertexInputFormat

> I am reading in an adjacency list using an input format which extends
> TextVertexInputFormat.  My code doesn¹t do anything to address input splits,
> but leaves that to the underlying giraph implementation.  However it appears
> that as the data is being read 2 identical input splits are created and read
> in, resulting in edges for each vertex being created twice.
> 
> My input format is a simple adjacency list, where each node is represented by
> a single line of text which lists the node id, and all of its neighbors.
> I read the edges into an edge list and then create the vertex via:
>> Vertex<Text, LouvainNodeState, LongWritable> vertex =
>> this.getConf().createVertex();
>> vertex.initialize(id, state, edgesList);
> 
> 
> Logs below show the edges being read in twice (as part of two different input
> splits in the input stage) and then being represented twice per node in the
> computation phase.
> This example is using 1 compute thread and 1 worker.
> 
> If I am creating the vertex incorrectly or doing something else wrong please
> let me know.  Thanks.
> 
> 
> 
> Log snippet of vertex input process.
> 
> 14/01/28 11:02:41 INFO worker.BspServiceWorker: loadInputSplits: Using 1
> thread(s), originally 1 threads(s) for 2 total splits.
> 14/01/28 11:02:41 INFO worker.InputSplitsHandler: reserveInputSplit: Reserved
> input split path 
> /_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/0,
> overall roughly 0.0% input splits reserved
> 14/01/28 11:02:41 INFO worker.InputSplitsCallable: getInputSplit: Reserved
> /_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/0
> from ZooKeeper and got input split
> 'hdfs://arcus1.silverdale.dev/tmp/louvain-giraph-example/1390935731/input/smal
> l:0+172'
> 14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 2:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 3:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 4:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 5:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 6:1
> 
> Š other nodes processed
> 
> 14/01/28 11:02:42 INFO worker.InputSplitsCallable: loadFromInputSplit:
> Finished loading 
> /_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/0
> (v=9, e=34)
> 14/01/28 11:02:42 INFO worker.InputSplitsHandler: reserveInputSplit: Reserved
> input split path 
> /_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/1,
> overall roughly 50.0% input splits reserved
> 14/01/28 11:02:42 INFO worker.InputSplitsCallable: getInputSplit: Reserved
> /_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/1
> from ZooKeeper and got input split
> 'hdfs://arcus1.silverdale.dev/tmp/louvain-giraph-example/1390935731/input/smal
> l:0+172'
> 14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 2:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 3:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 4:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 5:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 6:1
> 
> Š other nodes processed again
> 
> 
> Logs from the compute phase show that edges really are added twice  (format
> below shows edge #:target:weight)
> While each node should only have one edge to each other, it instead has two.
> 
> 4/01/28 11:02:42 INFO giraph.LouvainVertexComputation: NODE:  1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: EDGE 1: 2:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: EDGE 2: 3:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: EDGE 3: 4:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: EDGE 4: 5:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: EDGE 5: 6:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: EDGE 6: 2:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: EDGE 7: 3:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: EDGE 8: 4:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: EDGE 9: 5:1
> 14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: EDGE 10: 6:1
> 
> 



Mime
View raw message