giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maja Kabiljo" <majakabi...@fb.com>
Subject Review Request 12252: GIRAPH-704: Specialized message stores
Date Wed, 03 Jul 2013 16:02:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12252/
-----------------------------------------------------------

Review request for giraph.


Bugs: GIRAPH-704
    https://issues.apache.org/jira/browse/GIRAPH-704


Repository: giraph-git


Description
-------

I was investigating with where the time/CPU is going in some applications, and processing
messages turned out to be one of the most expensive things we do. We should provide better
implementations using primitive maps whenever that's possible.

Here are some results of page rank benchmark, using 40 workers, 100m vertices and 1k edges
per vertex (2.5b edges per worker).
* Current code, with combiner: superstep 75s, 265m cpu ms
* IntFloatMessageStore: superstep 55s, 185cpu ms
* Current code, without combiner: superstep 120s, 415m cpu ms
* IndByteArrayMessageStore: superstep 108s, 355m cpu ms
(I was running for 3 supersteps, when run with 0 supersteps it takes 26m cpu ms, so this should
be subtracted from all the numbers to get fair comparison)
So IntFloatMessageStore is about 35% cpu and 25% elapse time savings, IndByteArrayMessageStore
15% cpu and 10% time. On real huge graph, with LongDoubleMessageStore speedup was similar,
with LongByteArrayMessageStore even a bit better.
Also note that using combiner is much worse, we do have additional serialization/deserialization
there, but I am not sure that's enough to justify this huge difference. I tried sizing all
the buffers properly, it didn't help. Will do more investigation around this later.

I implemented this in a way that infrastructure chooses appropriate message store based on
vertex id, message type and combiner. We could also have an option, but this becomes trickier
with switchable computations and combiner. We'd have to add a function to switch message store
too, I'd rather wait to come up with a better solution to be able to switch things in configuration
in general, without adding specific methods for each.


Diffs
-----

  giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 4b0f985 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java affc260 
  giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
fecd7ee 
  giraph-core/src/main/java/org/apache/giraph/comm/messages/InMemoryMessageStoreFactory.java
ba8a005 
  giraph-core/src/main/java/org/apache/giraph/comm/messages/out_of_core/PartitionDiskBackedMessageStore.java
4ae805a 
  giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/IntByteArrayMessageStore.java
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/IntFloatMessageStore.java
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/LongByteArrayMessageStore.java
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/LongDoubleMessageStore.java
PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/package-info.java PRE-CREATION

  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdData.java 9b3f165 
  giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java c78d717 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 89b6f9e 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 35e6362 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java c8c09df 
  giraph-core/src/test/java/org/apache/giraph/comm/messages/TestIntFloatPrimitiveMessageStores.java
PRE-CREATION 
  giraph-core/src/test/java/org/apache/giraph/comm/messages/TestLongDoublePrimitiveMessageStores.java
PRE-CREATION 

Diff: https://reviews.apache.org/r/12252/diff/


Testing
-------

Passes mvn clean verify, added tests for new stores.
Tested on real large graph, with many compute and netty threads, verified that results are
the same (for both LongDoubleMessageStore and LongByteArrayMessageStore).


Thanks,

Maja Kabiljo


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message