giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Kimbrel <lekimb...@gmail.com>
Subject duplicate edges created with TextVertexInputFormat
Date Wed, 29 Jan 2014 17:08:46 GMT
I am reading in an adjacency list using an input format which extends TextVertexInputFormat.
 My code doesn’t do anything to address input splits, but leaves that to the underlying
giraph implementation.  However it appears that as the data is being read 2 identical input
splits are created and read in, resulting in edges for each vertex being created twice.

My input format is a simple adjacency list, where each node is represented by a single line
of text which lists the node id, and all of its neighbors.
I read the edges into an edge list and then create the vertex via:
Vertex<Text, LouvainNodeState, LongWritable> vertex = this.getConf().createVertex();
vertex.initialize(id, state, edgesList);


Logs below show the edges being read in twice (as part of two different input splits in the
input stage) and then being represented twice per node in the computation phase.
This example is using 1 compute thread and 1 worker.

If I am creating the vertex incorrectly or doing something else wrong please let me know.
 Thanks.



Log snippet of vertex input process.

14/01/28 11:02:41 INFO worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally
1 threads(s) for 2 total splits.
14/01/28 11:02:41 INFO worker.InputSplitsHandler: reserveInputSplit: Reserved input split
path /_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/0, overall
roughly 0.0% input splits reserved
14/01/28 11:02:41 INFO worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/0
from ZooKeeper and got input split 'hdfs://arcus1.silverdale.dev/tmp/louvain-giraph-example/1390935731/input/small:0+172'
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 2:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 3:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 4:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 5:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 6:1

… other nodes processed

14/01/28 11:02:42 INFO worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/0
(v=9, e=34)
14/01/28 11:02:42 INFO worker.InputSplitsHandler: reserveInputSplit: Reserved input split
path /_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/1, overall
roughly 50.0% input splits reserved
14/01/28 11:02:42 INFO worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/giraph_yarn_application_1390861968364_0029/_vertexInputSplitDir/1
from ZooKeeper and got input split 'hdfs://arcus1.silverdale.dev/tmp/louvain-giraph-example/1390935731/input/small:0+172'
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 2:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 3:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 4:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 5:1
14/01/28 11:02:42 INFO giraph.LouvainVertexInputFormat: Node 1 added edge 6:1

… other nodes processed again


Logs from the compute phase show that edges really are added twice  (format below shows edge
#:target:weight)
While each node should only have one edge to each other, it instead has two.

4/01/28 11:02:42 INFO giraph.LouvainVertexComputation: NODE:  1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 1: 2:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 2: 3:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 3: 4:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 4: 5:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 5: 6:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 6: 2:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 7: 3:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 8: 4:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 9: 5:1
14/01/28 11:02:42 INFO giraph.LouvainVertexComputation: 
EDGE 10: 6:1



Mime
View raw message