giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching" <avery.ch...@gmail.com>
Subject Re: Review Request: GIRAPH-435: Serialize server messages for memory and less GC
Date Tue, 27 Nov 2012 07:52:32 GMT


> On Nov. 21, 2012, 11:34 p.m., Maja Kabiljo wrote:
> > Thanks for fixing it!
> > 
> > This is for some future patch, but just to put it out there. Having messages in
serialized format can help us better tune the moment when we want to go out-of-core - now
we actually know how much bytes are the messages in memory taking. So we don't need to use
the number of messages as the limit anymore. Maybe Alessandro can tell more, but I guess the
similar thing can be applied to partitions, when we use the serialized format.

I agree to change the buffers and out-of-core triggers to be bytes rather than message counts.
 Thanks for the thorough review.  This uncovered a bunch of subtle, ugly bugs.


> On Nov. 21, 2012, 11:34 p.m., Maja Kabiljo wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java,
lines 240-251
> > <https://reviews.apache.org/r/8140/diff/3/?file=222723#file222723line240>
> >
> >     Can we use getExtendedDataOutput here also?

No, the iterator is different here unfortunately.


> On Nov. 21, 2012, 11:34 p.m., Maja Kabiljo wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java,
line 216
> > <https://reviews.apache.org/r/8140/diff/3/?file=222725#file222725line216>
> >
> >     Here we are keeping the message store locked while we do disk i/o. Can we do
better?

Yeah, I changed this to mimic the same behavior as before by storing it into a temporary message
store.


> On Nov. 21, 2012, 11:34 p.m., Maja Kabiljo wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/CollectionUtils.java,
line 67
> > <https://reviews.apache.org/r/8140/diff/3/?file=222744#file222744line67>
> >
> >     Probably not important, but this won't work correctly if we have duplicates
in the iterable. Maybe just add a comment.

Fixed this.  It actually uncovered a huge bug, thanks for noticing!


- Avery


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8140/#review13679
-----------------------------------------------------------


On Nov. 27, 2012, 7:52 a.m., Avery Ching wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8140/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2012, 7:52 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> This patch serializes messages on the server into byte[] and also reduces the amount
of objects being created by copying a ByteArrayVertexIdMessages object to a MessageStore and
using a representative message when doing computation of the vertices.
> 
> - Added new helper classes  ByteArrayIterable, ByteArrayIterator, RepresentativeByteArrayIterable,
and RepresentativeByteArrayIterator to help create objects only when necessary
> - Removed all collection methods from SimpleMessageStore and its children
> - ByteArrayMessagesPerVertexStore stores messages on the server in serialized byte[]
through ExtendedDataOutput (replaces CollectionOfMessagesPerVertexStore, which used objects)
> - Iterators from ByteArrayVertexIdMessages are specialized to reduce the number of objects
created (only when necessary)
> - Allow the choice of message serialization with encoded message size (better for complex
message objects) or just the message serialized (better for simple messages)
> 
> Results: 10 workers, 2.5 B edges, 10 m vertices, page rank
> 
> When not using the combiner, compute time on superstep 0 dropped from 12.05 -> 10.11,
other supersteps improved accordingly as well.  Free memory after superstep 2 improved from
26727.61 M -> 53102.54 M.  GC time went down from 1,213.502 -> 518.597 (more than half).
 Overall application time dropped from 353.085 -> 244.434 seconds.
> 
> When using the combiner, there are still some modest gains due to the reduction in objects
created (35 -> 29 seconds for superstep 0).
> 
> Trunk (no combiner):
> 
> INFO    2012-11-16 17:57:35,036 [compute-23] org.apache.giraph.graph.ComputeCallable
 - call: Computation took 12.054391 secs for 1 partitions on superstep 0.  Flushing started
> INFO    2012-11-16 17:57:35,081 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 0 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 61302.61231994629M
> INFO    2012-11-16 17:59:04,322 [compute-13] org.apache.giraph.graph.ComputeCallable
 - call: Computation took 12.514889 secs for 1 partitions on superstep 1.  Flushing started
> INFO    2012-11-16 17:59:04,369 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 34318.67655181885M
> INFO    2012-11-16 18:00:29,559 [compute-10] org.apache.giraph.graph.ComputeCallable
 - call: Computation took 11.869164 secs for 1 partitions on superstep 2.  Flushing started
> INFO    2012-11-16 18:00:29,602 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 2 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 26727.610847473145M
> INFO    2012-11-16 18:01:45,788 [compute-5] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 1.5657606 secs for 1 partitions on superstep 3.  Flushing started
> INFO    2012-11-16 18:01:45,789 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 3 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 38338.96612548828M
> 
> Giraph Timers   Total (milliseconds)    353,085         0       353,085
> Superstep 3 (milliseconds)              4,694           0       4,694
> Setup (milliseconds)                    2,354           0       2,354
> Vertex input superstep (milliseconds)   71,891          0       71,891
> Shutdown (milliseconds)                 15,104          0       15,104
> Superstep 0 (milliseconds)              85,516          0       85,516
> Superstep 2 (milliseconds)              87,720          0       87,720
> Superstep 1 (milliseconds)              85,803          0       85,803
> Max heap size after GC in bytes         405,574,586,784         0       405,574,586,784
> Total number of GC  1,823               0                       1,823
> Total time of GC in milliseconds        1,213,502               0       1,213,502
> Heap size after last GC in bytes        398,867,750,728         0       398,867,750,728
> 
> GIRAPH-435 (no combiner)
> 
> INFO    2012-11-19 01:31:02,712 [compute-0] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 10.109899 secs for 1 partitions on superstep 0.  Flushing started
> INFO    2012-11-19 01:31:02,754 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 0 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 63432.56399536133M
> INFO    2012-11-19 01:31:55,448 [compute-7] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 9.036797 secs for 1 partitions on superstep 1.  Flushing started
> INFO    2012-11-19 01:31:55,491 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 59648.34732055664M
> INFO    2012-11-19 01:32:52,595 [compute-9] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 8.786798 secs for 1 partitions on superstep 2.  Flushing started
> INFO    2012-11-19 01:32:52,782 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 2 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 53102.54949951172M
> INFO    2012-11-19 01:33:40,104 [compute-15] org.apache.giraph.graph.ComputeCallable
 - call: Computation took 1.481184 secs for 1 partitions on superstep 3.  Flushing started
> INFO    2012-11-19 01:33:40,105 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 3 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 65556.08256530762M
> Total (milliseconds)            244,434                                          0  
               244,434
> Superstep 3 (milliseconds)      4,865                                            0  
               4,865
> oSetup (milliseconds)           2,210                                            0  
               2,210
> Vertex input superstep (milliseconds)                                            73,466
            0   73,466
> Shutdown (milliseconds)                                                          60 
               0   60
> Superstep 0 (milliseconds)                                                       51,097
            0   51,097
> Superstep 2 (milliseconds)                                                       55,029
            0   55,029
> Superstep 1 (milliseconds)                                                       57,704
            0   57,704
> Max heap size after GC in bytes                                                  241,391,753,216
   0   241,391,753,216
> Total number of GC  1,614                                                        0  
               1,614
> Total time of GC in milliseconds                                                 518,597
           0   518,597
> Heap size after last GC in bytes                                                 211,137,611,752
   0   211,137,611,752
> 
> Trunk (with combiner):
> 
> Total (milliseconds)    190,247  0       190,247
> Superstep 3 (milliseconds)       4,441   0      4,441
> Setup (milliseconds)             2,761   0      2,761
> Vertex input superstep (milliseconds)    73,147         0       73,147
> Shutdown (milliseconds)                  359            0       359
> Superstep 0 (milliseconds)               35,376         0       35,376
> Superstep 2 (milliseconds)               35,896         0       35,896
> Superstep 1 (milliseconds)               38,262         0       38,262
> Max heap size after GC in bytes          182,641,373,552        0       182,641,373,552
> Total number of GC  1,722                0                      1,722
> Total time of GC in milliseconds         418,019                0       418,019
> Heap size after last GC in bytes         167,224,437,896        0       167,224,437,896
> 
> GIRAPH-435 (with combiner):
> 
> Total (milliseconds)    175,022         0       175,022
> Superstep 3 (milliseconds)              4,716   0       4,716
> Setup (milliseconds)                    3,070   0       3,070
> Vertex input superstep (milliseconds)   72,903  0       72,903
> Shutdown (milliseconds)                 208     0       208
> Superstep 0 (milliseconds)              29,271  0       29,271
> Superstep 2 (milliseconds)              31,667  0       31,667
> Superstep 1 (milliseconds)              33,183  0       33,183
> Max heap size after GC in bytes         176,287,403,976         0       176,287,403,976
> Total number of GC  1,493               0                       1,493
> Total time of GC in milliseconds        354,427                 0       354,427
> Heap size after last GC in bytes        154,542,562,320         0       154,542,562,320
> 
> 
> This addresses bug GIRAPH-435.
>     https://issues.apache.org/jira/browse/GIRAPH-435
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendMessageCache.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/BasicMessageStore.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStoreByPartition.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/MessageStore.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/MessageStoreByPartition.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SequentialFileMessageStore.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyClient.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendPartitionCurrentMessagesRequest.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendWorkerMessagesRequest.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleMsgVertex.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/ComputeCallable.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Vertex.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayIterable.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayIterator.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessageCollection.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/CollectionUtils.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/EmptyIterable.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExtendedByteArrayDataOutput.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExtendedDataOutput.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterable.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterator.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/UnsafeByteArrayOutputStream.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java
1411975 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java
1411975 
> 
> Diff: https://reviews.apache.org/r/8140/diff/
> 
> 
> Testing
> -------
> 
> Passed unittests, page rank benchmark on a real cluster, internal applications showed
nice improvement as well.
> 
> 
> Thanks,
> 
> Avery Ching
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message