giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta" <alessan...@fb.com>
Subject Re: Review Request: Multigraph support.
Date Wed, 19 Dec 2012 05:35:49 GMT


> On Dec. 19, 2012, 12:26 a.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeListVertexBase.java, line
55
> > <https://reviews.apache.org/r/8560/diff/2/?file=237912#file237912line55>
> >
> >     Why is this not just a default implementation of addEdge()?

I thought it would be more sensible to keep things modular: the base class doesn't define
a behavior, just provides controlled access to the internals (edge list/byte array). The point
is that I don't want to make the underlying structure protected, because user vertex classes
would have access to it.


> On Dec. 19, 2012, 12:26 a.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/LongDoubleNullDoubleVertex.java,
line 59
> > <https://reviews.apache.org/r/8560/diff/2/?file=237918#file237918line59>
> >
> >     This seems wrong, shouldn't you call setEdges(edges) here?

Yeah thanks, good catch. I noticed that this class lacked a setEdges implementation, but I
screwed up initialize.


> On Dec. 19, 2012, 12:26 a.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/LongDoubleNullDoubleVertex.java,
lines 51-52
> > <https://reviews.apache.org/r/8560/diff/2/?file=237918#file237918line51>
> >
> >     Same as the base class, isn't it? Why do you need this?

Because this vertex class keeps id and value as primitives. I personally don't think it's
a good idea, but I didn't want to change it.


> On Dec. 19, 2012, 12:26 a.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/MultiGraphEdgeListVertex.java,
line 50
> > <https://reviews.apache.org/r/8560/diff/2/?file=237919#file237919line50>
> >
> >     Would we want this to return Collection<E> now that it can remove multiple
edges?

This makes perfect sense, I wonder why I didn't do it this way. I guess I should differentiate
the interface, with MutableVertex having removeEdges() and StrictMutableVertex having removeEdge().


> On Dec. 19, 2012, 12:26 a.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/RepresentativeVertexBase.java,
line 259
> > <https://reviews.apache.org/r/8560/diff/2/?file=237923#file237923line259>
> >
> >     readHaltBoolean()

Nice, didn't know this.


> On Dec. 19, 2012, 12:26 a.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/RepresentativeVertexBase.java,
line 123
> > <https://reviews.apache.org/r/8560/diff/2/?file=237923#file237923line123>
> >
> >     Seems like this could be more efficient by keeping all the offsets to be removed
and doing bunch of smaller array copies at the end?
> >     Also I think we can clean this up code-wise a bit by making a helper function
removeEdgeFromIter(Iterator) that gets called from both this and removeFirstEdge above?

Good idea to remove the sub-arrays in the end. We can't really do the second thing, since
the logic of the System.arraycopy() call will be different in the two algorithms (in one case
you move all the remainder of the array that comes after the removed edge, in the other you
remove an internal sub-array, up to where the next removed edge starts).


> On Dec. 19, 2012, 12:26 a.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/SimpleVertex.java, line 57
> > <https://reviews.apache.org/r/8560/diff/2/?file=237925#file237925line57>
> >
> >     since you have it in multiple places why not make some EdgeFunctions static
class with this function as a single static member (say EdgeIdExtractor) type thing and reference
that in both places? I think it would be useful to start pulling out common edge/vertex logic
to helper classes as it will get us closer to a slim edge/vertex interface rather than the
convoluted hierarchy we have now.

Ok, I'm thinking of actually providing methods for converting between iterables.


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8560/#review14698
-----------------------------------------------------------


On Dec. 13, 2012, 2:57 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8560/
> -----------------------------------------------------------
> 
> (Updated Dec. 13, 2012, 2:57 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> This patch introduces support for multigraphs in Giraph, by:
> - changing Vertex#initialize() and Vertex#setEdges() to take a collection of edges instead
of a map
> - adding two vertex implementations (MultiGraphEdgeListVertex and MultiGraphRepresentativeVertex)
where addEdge() only appends the edge, instead of checking for duplicates
> 
> This is not only useful for representing multigraphs, but also for implementing addEdge()
efficiently (critical for edge-based input when the out-degree can be really large).
> 
> Other minor changes:
> - changed MutableVertex#addEdge() to get an Edge instead of an id and value, mainly for
consistency with other code
> - added PseudoRandomEdgeInputFormat and extended PageRankBenchmark to accept edge-based
input
> - various fixes on the way
> 
> I ran several benchmarks to compare this patch with the current version, and edge-based
input with vertex-based input:
> 
> Graph: 10M vertices, 1B edges, 10 workers
> 
> EdgeListVertex + VertexInputFormat, current version:
> Total (milliseconds) 118,377
> Input superstep (milliseconds) 44,041
> Superstep 0 (milliseconds) 20,970
> Superstep 1 (milliseconds) 23,731
> Superstep 2 (milliseconds) 24,269
> 
> EdgeListVertex + VertexInputFormat, patched version:
> Total (milliseconds) 116,298
> Input superstep (milliseconds) 40,441
> Superstep 0 (milliseconds) 22,444
> Superstep 1 (milliseconds) 24,164
> Superstep 2 (milliseconds) 25,036
> 
> MultiGraphEdgeListVertex + EdgeInputFormat, patched version:
> Total (milliseconds) 148,905
> Input superstep (milliseconds) 72,425
> Superstep 0 (milliseconds) 23,684
> Superstep 1 (milliseconds) 25,580
> Superstep 2 (milliseconds) 22,456
> 
> RepresentativeVertex + VertexInputFormat, current version:
> Total (milliseconds) 111,450
> Input superstep (milliseconds) 39,301
> Superstep 0 (milliseconds) 21,615
> Superstep 1 (milliseconds) 22,213
> Superstep 2 (milliseconds) 21,840
> 
> RepresentativeVertex + VertexInputFormat, patched version:
> Total (milliseconds) 106,512
> Input superstep (milliseconds) 38,142
> Superstep 0 (milliseconds) 20,812
> Superstep 1 (milliseconds) 22,906
> Superstep 2 (milliseconds) 21,250
> 
> MultiGraphRepresentativeVertex + EdgeInputFormat, patched version:
> Total (milliseconds) 143,831
> Input superstep (milliseconds) 75,030
> Superstep 0 (milliseconds) 20,661
> Superstep 1 (milliseconds) 21,406
> Superstep 2 (milliseconds) 22,456
> 
> 
> This addresses bug GIRAPH-141.
>     https://issues.apache.org/jira/browse/GIRAPH-141
> 
> 
> Diffs
> -----
> 
>   /trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatalogVertexInputFormat.java
1419623 
>   /trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/io/accumulo/edgemarker/AccumuloEdgeInputFormat.java
1419623 
>   /trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/io/hbase/edgemarker/TableEdgeInputFormat.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/benchmark/MultiGraphEdgeListVertexPageRankBenchmark.java
PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/benchmark/MultiGraphRepresentativeVertexPageRankBenchmark.java
PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1419623

>   /trunk/giraph/src/main/java/org/apache/giraph/examples/LongDoubleFloatDoubleTextInputFormat.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/examples/NormalizingLongDoubleFloatDoubleTextInputFormat.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java 1419623

>   /trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java 1419623

>   /trunk/giraph/src/main/java/org/apache/giraph/examples/VerifyMessage.java 1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/DefaultVertexResolver.java 1419623

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeListVertex.java 1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeListVertexBase.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/HashMapVertex.java 1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java 1419623

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/IntNullNullNullVertex.java 1419623

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleEdgeListVertex.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/LongDoubleNullDoubleVertex.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/MultiGraphEdgeListVertex.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/MultiGraphRepresentativeVertex.java
PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/MutableVertex.java 1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/RepresentativeVertex.java 1419623

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/RepresentativeVertexBase.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/SimpleMutableVertex.java 1419623

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/SimpleVertex.java 1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/Vertex.java 1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/AdjacencyListTextVertexInputFormat.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/IntIntNullIntTextInputFormat.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/IntNullNullNullTextInputFormat.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/JsonBase64VertexInputFormat.java 1419623

>   /trunk/giraph/src/main/java/org/apache/giraph/io/JsonLongDoubleFloatDoubleVertexInputFormat.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/LongDoubleDoubleAdjacencyListVertexInputFormat.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/PseudoRandomEdgeInputFormat.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/io/PseudoRandomVertexInputFormat.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/TextDoubleDoubleAdjacencyListVertexInputFormat.java
1419623 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/TextVertexInputFormat.java 1419623

>   /trunk/giraph/src/test/java/org/apache/giraph/examples/SimpleShortestPathsVertexTest.java
1419623 
>   /trunk/giraph/src/test/java/org/apache/giraph/examples/SimpleTriangleClosingVertexTest.java
1419623 
>   /trunk/giraph/src/test/java/org/apache/giraph/graph/TestIntIntNullIntVertex.java 1419623

>   /trunk/giraph/src/test/java/org/apache/giraph/graph/TestMutableVertex.java 1419623

>   /trunk/giraph/src/test/java/org/apache/giraph/graph/partition/TestGiraphTransferRegulator.java
1419623 
>   /trunk/giraph/src/test/java/org/apache/giraph/io/TestTextDoubleDoubleAdjacencyListVertexInputFormat.java
1419623 
> 
> Diff: https://reviews.apache.org/r/8560/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and the benchmarks described
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message