giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta" <alessan...@fb.com>
Subject Re: Review Request: GIRAPH-515: More efficient and flexible edge-based input
Date Fri, 15 Feb 2013 18:14:37 GMT


> On Feb. 15, 2013, 4:46 p.m., Nitay Joffe wrote:
> > Cool looks great to me. Maja should take a look but I'm +1 on it. Does it make any
sense to look into making SendMutationsCache and SendPartitionCache also use SendCache, or
are those too different?

I'm not so sure about SendPartitionCache, but SendMutationsCache probably, yes. It's another
case of sent data indexed by vertex id.
However mutations probably need a general revamp, so maybe open another JIRA for this?


> On Feb. 15, 2013, 4:46 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/comm/SendCache.java, line 50
> > <https://reviews.apache.org/r/9449/diff/2/?file=258890#file258890line50>
> >
> >     Can't we reuse this across all our various SendCaches?

See above.


> On Feb. 15, 2013, 4:46 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/comm/SendCache.java, line 116
> > <https://reviews.apache.org/r/9449/diff/2/?file=258890#file258890line116>
> >
> >     nit: how come final here?

A lot of this is just a refactor of existing code, so I didn't change things like that. I'll
remove the final for consistency.


> On Feb. 15, 2013, 4:46 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/comm/SendMessageCache.java, line 45
> > <https://reviews.apache.org/r/9449/diff/2/?file=258892#file258892line45>
> >
> >     what is this suppress warnings thing anyways?

I'm casting a Generic<A> to a Generic<B>, where B is a subclass of A, because
I know that the former really is an instance of the latter.
The only way I could do this was by casting to Generic without type arguments. This is an
unchecked cast that works fine because of type-erasure, but IDEs highlight it unless you suppress
the warning.


> On Feb. 15, 2013, 4:46 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayEdges.java, line 72
> > <https://reviews.apache.org/r/9449/diff/2/?file=258909#file258909line72>
> >
> >     Seems kinda weird to create an ExtendedDataOutput only to grab the byte[] and
pos (which is just 0?) out of it. Why not create them directly?

This was refactored out of RepresentativeVertex (now ByteArrayVertex).
I think it goes through ExtendedDataOutput for generality, because it might allocate byte-arrays
with a default initial size.


> On Feb. 15, 2013, 4:46 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayEdges.java, line 98
> > <https://reviews.apache.org/r/9449/diff/2/?file=258909#file258909line98>
> >
> >     I'm a little confused why we create a new extended data output each time instead
of storing it in constructor and reusing the object? I know we do this in RepresentativeVertex
too but I forget why?

We should ask Avery. Maybe you're right and we could store the data output instead of the
byte-array.
See for example ByteArrayVertexIdData.


> On Feb. 15, 2013, 4:46 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayEdges.java, line 101
> > <https://reviews.apache.org/r/9449/diff/2/?file=258909#file258909line101>
> >
> >     not sure if you really need this. If the edge value is NullWritable then write()
is a no-op. Do you actually save by having this check?

Oh, cool, for some reason I assumed it wrote some representation of null.
I guess it doesn't make sense, because when reading you're also reading to a NullWritable,
so you have control.
Will remove those checks.


> On Feb. 15, 2013, 4:46 p.m., Nitay Joffe wrote:
> > giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdEdges.java, line
58
> > <https://reviews.apache.org/r/9449/diff/2/?file=258911#file258911line58>
> >
> >     Also you do this enough times if it is worthwhile we might as well make a helper
functions somewhere like writeEdge(Edge<>) and readEdge(Edge<>)

True, even though Edge is not Writable we can have methods with those signatures that assume
the edges's fields are already created.
I'm adding these to WritableUtils.


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9449/#review16626
-----------------------------------------------------------


On Feb. 15, 2013, 1:24 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9449/
> -----------------------------------------------------------
> 
> (Updated Feb. 15, 2013, 1:24 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> This patch adds the following classes:
> - SendWorkerEdgesRequest: a request used to send edges during input superstep, similar
to the corresponding one for messages
> - SendEdgeCache: similar to SendMessageCache
> - ByteArrayVertexIdEdges: serialized representation for lists of edges (for different
source vertices), similar to the corresponding one for messages
> - EdgeStore: a server-side structure that stores transient edges from incoming requests,
and later moves them to the owning vertices.
> - ByteArrayEdges: an edge list (for the same source vertex) stored as a byte-array. The
standard way of iterating is by reusing Edge objects, but an alternative iterator that instantiates
new objects is provided. Depending on the vertex implementation, we use one of the other.
> This is a refactor of the byte-array code in RepresentativeVertex, which now contains
an instance of ByteArrayEdges.
> When calling setEdges(), RepresentativeVertex is smart to realize that the passed Iterable
is actually an instance of ByteArrayEdges, and simply takes ownership of it (without iterating).
> If using something like EdgeListVertex (which keeps references to the passed edges),
we will use the alternative iterable (this is of course less memory-efficient).
> 
> I've also renamed RepresentativeVertex to ByteArrayVertex because it was misleading (it
doesn't need to be used with ByteArrayPartition, it's perfectly fine to have multiple Vertex
objects, each storing its edges in a byte-array).
> 
> Future work:
> 
> EdgeStore could become an interface in the future, allowing for different implementations
(e.g. out-of-core) and handling permanent edge storage in place of Vertex. That way, we would
have only one Vertex class, and pluggable storage implementations (which makes it easier to
switch without changing user code).
> 
> 
> This addresses bug GIRAPH-515.
>     https://issues.apache.org/jira/browse/GIRAPH-515
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/benchmark/ByteArrayVertexPageRankBenchmark.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/benchmark/MultiGraphByteArrayVertexPageRankBenchmark.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/benchmark/MultiGraphRepresentativeVertexPageRankBenchmark.java
96288323e6028e779113d2520ea9edad497bb0e1 
>   giraph-core/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 19b08bdb19df21b1dc56dad2cebb499222f9b19e

>   giraph-core/src/main/java/org/apache/giraph/benchmark/RepresentativeVertexPageRankBenchmark.java
331ae41a2c0df6b124cbf33944b05f080b49ce94 
>   giraph-core/src/main/java/org/apache/giraph/comm/SendCache.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/SendEdgeCache.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/SendMessageCache.java 3cbf0eb4775fa3ff0b0351f247df87783bf05995

>   giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 3655d79d8f249338da30ae2bb38b9cfd6b7b1f56

>   giraph-core/src/main/java/org/apache/giraph/comm/WorkerClientRequestProcessor.java
0c043e29ae3160bbfc389c435427cf57010a91e1 
>   giraph-core/src/main/java/org/apache/giraph/comm/WorkerServer.java e60db5529b7fef0b16441ef88df7053d6856ffc5

>   giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
65caa5d2777b90fa8e14bee7c8d69316d512c651 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
d4e919ed1aa1f977a2e487531f57b3a2fc0fad47 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1b7cc5410aa4d7e1b9ae4580dd5ed484e09ff7ed

>   giraph-core/src/main/java/org/apache/giraph/comm/requests/RequestType.java aac00289f915f61e61334cdcd92c93c1ef3b5419

>   giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerDataRequest.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerEdgesRequest.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerMessagesRequest.java
641c795521006c460138d6b3b6d9ceb3c3e7eccf 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3

>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 8797c0e80824558bf544650f7c896bddd3f873fb

>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java
3e158afdc480656b3937508f5d86ec294bfa3b99 
>   giraph-core/src/main/java/org/apache/giraph/graph/EdgeStore.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java 12989180a4aabed19c3aefa52ef38ad6d7aa6851

>   giraph-core/src/main/java/org/apache/giraph/partition/DiskBackedPartitionStore.java
844a229096005059e9cd05b5bf213d2afa1d41dd 
>   giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayEdges.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdData.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdEdges.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java dea4229f10224edb30f59626d5987ea840e8a271

>   giraph-core/src/main/java/org/apache/giraph/utils/VertexIdIterator.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertex.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertexBase.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/vertex/EdgeListVertex.java 9ae692fc00432e28f0b87f11ed5981e600c95019

>   giraph-core/src/main/java/org/apache/giraph/vertex/MultiGraphByteArrayVertex.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/vertex/MultiGraphRepresentativeVertex.java
4733e2a6011ec8e1cc4eef1d2eb61abe777ec310 
>   giraph-core/src/main/java/org/apache/giraph/vertex/RepresentativeVertex.java f805007b8bb8f89e9388cf89c2e81f92328b2b1c

>   giraph-core/src/main/java/org/apache/giraph/vertex/RepresentativeVertexBase.java 4de6ed85b499e74b04e93c3780324a6b9e9f2b83

>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java fa3ab49f11d61352a5f6f69699375abd2bf1e527

>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java bdf9f5705811340748172a70dc952493d5ececfc

>   giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 2845c90cbfd38f2f35e70e3b79767e1386d54a7e

>   giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java d779fe46377eaa8fa2debf0836f975a30ec6e21f

>   giraph-core/src/test/java/org/apache/giraph/utils/MockUtils.java 82dc2839d83f80ebcf52bad252886d50310eacc5

>   giraph-core/src/test/java/org/apache/giraph/vertex/TestMultiGraphVertex.java a5a3545de7dc9e30ab0f30926122049fdbe1173b

>   giraph-core/src/test/java/org/apache/giraph/vertex/TestMutableVertex.java ca4ba1a336f68b584c4fdbaf74be60dbe41644d5

> 
> Diff: https://reviews.apache.org/r/9449/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> Tested on both benchmarks and real-world applications.
> This typically brings requirements down a lot: in an application using a few hundred
billion edges, which previously only ran with 300 workers, we're now able to run with 100
workers, with a lot of memory to spare and even faster than before (from around 600s to 400s).
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message