giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta" <alessan...@fb.com>
Subject Re: Review Request: GIRAPH-515: More efficient and flexible edge-based input
Date Fri, 15 Feb 2013 01:24:31 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9449/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 1:24 a.m.)


Review request for giraph.


Changes
-------

Refactored with common abstractions for sending edges/messages, as per Maja's advice.

ByteArrayVertexIdData extended by ByteArrayVertexId(Messages/Edges)

SendCache extended by Send(Message/Edge)Cache

SendWorkerDataRequest extended by SendWorker(Messages/Edges)Request


Description
-------

This patch adds the following classes:
- SendWorkerEdgesRequest: a request used to send edges during input superstep, similar to
the corresponding one for messages
- SendEdgeCache: similar to SendMessageCache
- ByteArrayVertexIdEdges: serialized representation for lists of edges (for different source
vertices), similar to the corresponding one for messages
- EdgeStore: a server-side structure that stores transient edges from incoming requests, and
later moves them to the owning vertices.
- ByteArrayEdges: an edge list (for the same source vertex) stored as a byte-array. The standard
way of iterating is by reusing Edge objects, but an alternative iterator that instantiates
new objects is provided. Depending on the vertex implementation, we use one of the other.
This is a refactor of the byte-array code in RepresentativeVertex, which now contains an instance
of ByteArrayEdges.
When calling setEdges(), RepresentativeVertex is smart to realize that the passed Iterable
is actually an instance of ByteArrayEdges, and simply takes ownership of it (without iterating).
If using something like EdgeListVertex (which keeps references to the passed edges), we will
use the alternative iterable (this is of course less memory-efficient).

I've also renamed RepresentativeVertex to ByteArrayVertex because it was misleading (it doesn't
need to be used with ByteArrayPartition, it's perfectly fine to have multiple Vertex objects,
each storing its edges in a byte-array).

Future work:

EdgeStore could become an interface in the future, allowing for different implementations
(e.g. out-of-core) and handling permanent edge storage in place of Vertex. That way, we would
have only one Vertex class, and pluggable storage implementations (which makes it easier to
switch without changing user code).


This addresses bug GIRAPH-515.
    https://issues.apache.org/jira/browse/GIRAPH-515


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/benchmark/ByteArrayVertexPageRankBenchmark.java
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/benchmark/MultiGraphByteArrayVertexPageRankBenchmark.java
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/benchmark/MultiGraphRepresentativeVertexPageRankBenchmark.java
96288323e6028e779113d2520ea9edad497bb0e1 
  giraph-core/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 19b08bdb19df21b1dc56dad2cebb499222f9b19e

  giraph-core/src/main/java/org/apache/giraph/benchmark/RepresentativeVertexPageRankBenchmark.java
331ae41a2c0df6b124cbf33944b05f080b49ce94 
  giraph-core/src/main/java/org/apache/giraph/comm/SendCache.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/SendEdgeCache.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/SendMessageCache.java 3cbf0eb4775fa3ff0b0351f247df87783bf05995

  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 3655d79d8f249338da30ae2bb38b9cfd6b7b1f56

  giraph-core/src/main/java/org/apache/giraph/comm/WorkerClientRequestProcessor.java 0c043e29ae3160bbfc389c435427cf57010a91e1

  giraph-core/src/main/java/org/apache/giraph/comm/WorkerServer.java e60db5529b7fef0b16441ef88df7053d6856ffc5

  giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
65caa5d2777b90fa8e14bee7c8d69316d512c651 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
d4e919ed1aa1f977a2e487531f57b3a2fc0fad47 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1b7cc5410aa4d7e1b9ae4580dd5ed484e09ff7ed

  giraph-core/src/main/java/org/apache/giraph/comm/requests/RequestType.java aac00289f915f61e61334cdcd92c93c1ef3b5419

  giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerDataRequest.java PRE-CREATION

  giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerEdgesRequest.java PRE-CREATION

  giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerMessagesRequest.java
641c795521006c460138d6b3b6d9ceb3c3e7eccf 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3

  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 8797c0e80824558bf544650f7c896bddd3f873fb

  giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java
3e158afdc480656b3937508f5d86ec294bfa3b99 
  giraph-core/src/main/java/org/apache/giraph/graph/EdgeStore.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java 12989180a4aabed19c3aefa52ef38ad6d7aa6851

  giraph-core/src/main/java/org/apache/giraph/partition/DiskBackedPartitionStore.java 844a229096005059e9cd05b5bf213d2afa1d41dd

  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayEdges.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdData.java PRE-CREATION

  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdEdges.java PRE-CREATION

  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java dea4229f10224edb30f59626d5987ea840e8a271

  giraph-core/src/main/java/org/apache/giraph/utils/VertexIdIterator.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertex.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertexBase.java PRE-CREATION

  giraph-core/src/main/java/org/apache/giraph/vertex/EdgeListVertex.java 9ae692fc00432e28f0b87f11ed5981e600c95019

  giraph-core/src/main/java/org/apache/giraph/vertex/MultiGraphByteArrayVertex.java PRE-CREATION

  giraph-core/src/main/java/org/apache/giraph/vertex/MultiGraphRepresentativeVertex.java 4733e2a6011ec8e1cc4eef1d2eb61abe777ec310

  giraph-core/src/main/java/org/apache/giraph/vertex/RepresentativeVertex.java f805007b8bb8f89e9388cf89c2e81f92328b2b1c

  giraph-core/src/main/java/org/apache/giraph/vertex/RepresentativeVertexBase.java 4de6ed85b499e74b04e93c3780324a6b9e9f2b83

  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java fa3ab49f11d61352a5f6f69699375abd2bf1e527

  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java bdf9f5705811340748172a70dc952493d5ececfc

  giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 2845c90cbfd38f2f35e70e3b79767e1386d54a7e

  giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java d779fe46377eaa8fa2debf0836f975a30ec6e21f

  giraph-core/src/test/java/org/apache/giraph/utils/MockUtils.java 82dc2839d83f80ebcf52bad252886d50310eacc5

  giraph-core/src/test/java/org/apache/giraph/vertex/TestMultiGraphVertex.java a5a3545de7dc9e30ab0f30926122049fdbe1173b

  giraph-core/src/test/java/org/apache/giraph/vertex/TestMutableVertex.java ca4ba1a336f68b584c4fdbaf74be60dbe41644d5


Diff: https://reviews.apache.org/r/9449/diff/


Testing
-------

mvn verify

Tested on both benchmarks and real-world applications.
This typically brings requirements down a lot: in an application using a few hundred billion
edges, which previously only ran with 300 workers, we're now able to run with 100 workers,
with a lot of memory to spare and even faster than before (from around 600s to 400s).


Thanks,

Alessandro Presta


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message