giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman" <initialcont...@gmail.com>
Subject Re: Review Request: GIRAPH-435: Serialize server messages for memory and less GC
Date Tue, 20 Nov 2012 17:30:46 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8140/#review13629
-----------------------------------------------------------


Fantastic. +1

- Eli Reisman


On Nov. 20, 2012, 10:18 a.m., Avery Ching wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8140/
> -----------------------------------------------------------
> 
> (Updated Nov. 20, 2012, 10:18 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> This patch serializes messages on the server into byte[] and also reduces the amount
of objects being created by copying a ByteArrayVertexIdMessages object to a MessageStore and
using a representative message when doing computation of the vertices.
> 
> - Added new helper classes  ByteArrayIterable, ByteArrayIterator, RepresentativeByteArrayIterable,
and RepresentativeByteArrayIterator to help create objects only when necessary
> - Removed all collection methods from SimpleMessageStore and its children
> - ByteArrayMessagesPerVertexStore stores messages on the server in serialized byte[]
through ExtendedDataOutput (replaces CollectionOfMessagesPerVertexStore, which used objects)
> - Iterators from ByteArrayVertexIdMessages are specialized to reduce the number of objects
created (only when necessary)
> - Allow the choice of message serialization with encoded message size (better for complex
message objects) or just the message serialized (better for simple messages)
> 
> Results: 10 workers, 2.5 B edges, 10 m vertices, page rank
> 
> When not using the combiner, compute time on superstep 0 dropped from 12.05 -> 10.11,
other supersteps improved accordingly as well.  Free memory after superstep 2 improved from
26727.61 M -> 53102.54 M.  GC time went down from 1,213.502 -> 518.597 (more than half).
 Overall application time dropped from 353.085 -> 244.434 seconds.
> 
> When using the combiner, there are still some modest gains due to the reduction in objects
created (35 -> 29 seconds for superstep 0).
> 
> Trunk (no combiner):
> 
> INFO    2012-11-16 17:57:35,036 [compute-23] org.apache.giraph.graph.ComputeCallable
 - call: Computation took 12.054391 secs for 1 partitions on superstep 0.  Flushing started
> INFO    2012-11-16 17:57:35,081 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 0 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 61302.61231994629M
> INFO    2012-11-16 17:59:04,322 [compute-13] org.apache.giraph.graph.ComputeCallable
 - call: Computation took 12.514889 secs for 1 partitions on superstep 1.  Flushing started
> INFO    2012-11-16 17:59:04,369 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 34318.67655181885M
> INFO    2012-11-16 18:00:29,559 [compute-10] org.apache.giraph.graph.ComputeCallable
 - call: Computation took 11.869164 secs for 1 partitions on superstep 2.  Flushing started
> INFO    2012-11-16 18:00:29,602 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 2 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 26727.610847473145M
> INFO    2012-11-16 18:01:45,788 [compute-5] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 1.5657606 secs for 1 partitions on superstep 3.  Flushing started
> INFO    2012-11-16 18:01:45,789 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 3 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 38338.96612548828M
> 
> Giraph Timers   Total (milliseconds)    353,085         0       353,085
> Superstep 3 (milliseconds)              4,694           0       4,694
> Setup (milliseconds)                    2,354           0       2,354
> Vertex input superstep (milliseconds)   71,891          0       71,891
> Shutdown (milliseconds)                 15,104          0       15,104
> Superstep 0 (milliseconds)              85,516          0       85,516
> Superstep 2 (milliseconds)              87,720          0       87,720
> Superstep 1 (milliseconds)              85,803          0       85,803
> Max heap size after GC in bytes         405,574,586,784         0       405,574,586,784
> Total number of GC  1,823               0                       1,823
> Total time of GC in milliseconds        1,213,502               0       1,213,502
> Heap size after last GC in bytes        398,867,750,728         0       398,867,750,728
> 
> GIRAPH-435 (no combiner)
> 
> INFO    2012-11-19 01:31:02,712 [compute-0] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 10.109899 secs for 1 partitions on superstep 0.  Flushing started
> INFO    2012-11-19 01:31:02,754 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 0 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 63432.56399536133M
> INFO    2012-11-19 01:31:55,448 [compute-7] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 9.036797 secs for 1 partitions on superstep 1.  Flushing started
> INFO    2012-11-19 01:31:55,491 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 59648.34732055664M
> INFO    2012-11-19 01:32:52,595 [compute-9] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 8.786798 secs for 1 partitions on superstep 2.  Flushing started
> INFO    2012-11-19 01:32:52,782 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 2 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 53102.54949951172M
> INFO    2012-11-19 01:33:40,104 [compute-15] org.apache.giraph.graph.ComputeCallable
 - call: Computation took 1.481184 secs for 1 partitions on superstep 3.  Flushing started
> INFO    2012-11-19 01:33:40,105 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 3 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 65556.08256530762M
> Total (milliseconds)            244,434                                          0  
               244,434
> Superstep 3 (milliseconds)      4,865                                            0  
               4,865
> oSetup (milliseconds)           2,210                                            0  
               2,210
> Vertex input superstep (milliseconds)                                            73,466
            0   73,466
> Shutdown (milliseconds)                                                          60 
               0   60
> Superstep 0 (milliseconds)                                                       51,097
            0   51,097
> Superstep 2 (milliseconds)                                                       55,029
            0   55,029
> Superstep 1 (milliseconds)                                                       57,704
            0   57,704
> Max heap size after GC in bytes                                                  241,391,753,216
   0   241,391,753,216
> Total number of GC  1,614                                                        0  
               1,614
> Total time of GC in milliseconds                                                 518,597
           0   518,597
> Heap size after last GC in bytes                                                 211,137,611,752
   0   211,137,611,752
> 
> Trunk (with combiner):
> 
> Total (milliseconds)    190,247  0       190,247
> Superstep 3 (milliseconds)       4,441   0      4,441
> Setup (milliseconds)             2,761   0      2,761
> Vertex input superstep (milliseconds)    73,147         0       73,147
> Shutdown (milliseconds)                  359            0       359
> Superstep 0 (milliseconds)               35,376         0       35,376
> Superstep 2 (milliseconds)               35,896         0       35,896
> Superstep 1 (milliseconds)               38,262         0       38,262
> Max heap size after GC in bytes          182,641,373,552        0       182,641,373,552
> Total number of GC  1,722                0                      1,722
> Total time of GC in milliseconds         418,019                0       418,019
> Heap size after last GC in bytes         167,224,437,896        0       167,224,437,896
> 
> GIRAPH-435 (with combiner):
> 
> Total (milliseconds)    175,022         0       175,022
> Superstep 3 (milliseconds)              4,716   0       4,716
> Setup (milliseconds)                    3,070   0       3,070
> Vertex input superstep (milliseconds)   72,903  0       72,903
> Shutdown (milliseconds)                 208     0       208
> Superstep 0 (milliseconds)              29,271  0       29,271
> Superstep 2 (milliseconds)              31,667  0       31,667
> Superstep 1 (milliseconds)              33,183  0       33,183
> Max heap size after GC in bytes         176,287,403,976         0       176,287,403,976
> Total number of GC  1,493               0                       1,493
> Total time of GC in milliseconds        354,427                 0       354,427
> Heap size after last GC in bytes        154,542,562,320         0       154,542,562,320
> 
> 
> This addresses bug GIRAPH-435.
>     https://issues.apache.org/jira/browse/GIRAPH-435
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendMessageCache.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/BasicMessageStore.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStoreByPartition.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/MessageStore.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/MessageStoreByPartition.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SequentialFileMessageStore.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyClient.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendPartitionCurrentMessagesRequest.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendWorkerMessagesRequest.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleMsgVertex.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/ComputeCallable.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Vertex.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayIterable.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayIterator.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessageCollection.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/CollectionUtils.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/EmptyIterable.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExtendedByteArrayDataOutput.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExtendedDataOutput.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterable.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterator.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/UnsafeByteArrayOutputStream.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java
1411554 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java
1411554 
> 
> Diff: https://reviews.apache.org/r/8140/diff/
> 
> 
> Testing
> -------
> 
> Passed unittests, page rank benchmark on a real cluster, internal applications showed
nice improvement as well.
> 
> 
> Thanks,
> 
> Avery Ching
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message