giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-417) Serialize the graph/message cache into byte[] for improving memory usage and compute speed
Date Thu, 15 Nov 2012 20:21:12 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498306#comment-13498306
] 

Hudson commented on GIRAPH-417:
-------------------------------

Integrated in Giraph-trunk-Commit #283 (See [https://builds.apache.org/job/Giraph-trunk-Commit/283/])
    GIRAPH-417: Serialize the graph/message cache into byte[] for
improving memory usage and compute speed. (aching) (Revision 1409973)

     Result = FAILURE
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1409973
Files : 
* /giraph/trunk/CHANGELOG
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/RepresentativeVertexPageRankBenchmark.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendMessageCache.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendMutationsCache.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendPartitionCache.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessageCollection.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/CountingOutputStream.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStoreByPartition.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/MessageStoreByPartition.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendVertexRequest.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendWorkerMessagesRequest.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/ComputeCallable.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/RepresentativeVertex.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/ByteArrayPartition.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/Partition.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/SimplePartition.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessageCollection.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/DynamicChannelBufferInputStream.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/DynamicChannelBufferOutputStream.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExtendedByteArrayDataInput.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExtendedByteArrayDataOutput.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExtendedDataInput.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExtendedDataOutput.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/UnsafeByteArrayInputStream.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/UnsafeByteArrayOutputStream.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/WritableUtils.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/graph/TestMutableVertex.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/graph/partition/TestGiraphTransferRegulator.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java

                
> Serialize the graph/message cache into byte[] for improving memory usage and compute
speed
> ------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-417
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-417
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-417.2.patch, GIRAPH-417.patch
>
>
> Our entire graph is currently stored as Java objects in memory.  I added an option to
keep only a representative vertex that serializes/deserializes on the fly and should be used
with the new ByteArrayPartition.  In conjunction with a serialized client-side message cache,
memory usage then loading shrinks to almost 1/10 of trunk and loads the input splits almost
3x faster (see input superstep times below).  I added a serializer based on Sun's unsafe methods
that enables this memory savings with a very small performance hit (maybe a few 1-5% slower).
 Compared to trunk, when serializing the messages with our faster serializer, compute time
improves significantly as well against trunk (16.7 -> 12.31 for 2.5B edges, 2.97 ->
1.61 for 250M edges).  There are still further improvements to be made on the server side
where we still store our messages in-memory.  I (or someone else) can do that in a later patch.
 This also significantly reduces GC time, as there are less objects to GC.
> - Improves byte[] serialization signficantly
> -- Added ExtendedDataInput/ExtendedDataOutput interfaces to allow for some additional
methods needed for byte[] serialization/deserialization
> -- Add ExtendedByteArrayDataInput/ExtendedByteArrayDataoutput to serialize/deserialize
Writables to a byte[]
> -- Added DynamicChannelBufferOutputStream/DynamicChannelBufferInputStream to serialize/deserialize
Writables to a DynamicChannelBuffer
> - Gives you the choice of partition implementation (SimplePartition (default) or ByteArrayPartition
-> (serialized vertices))
> -- Added a new method to Partition called saveVertex(), which also the serialization
back into the ByteArrayPartition or does nothing when using SimplePartition
> - Gives you the choice of unsafe serialization (using Sun's unsafe class - default) or
regular serialization
> - Serializes the messages on the client cache into byte[] (saves memory and also serializes
faster)
> -- Created new ByteArrayVertexIdMessageCollection to support the serialized messages
> -- SendVertexRequest now sends Partition objects rather than collections
> - Adds 2 more options in PageRankBenchmark to try out RepresentationVertex or RepresentationVertex
with unsafe serialization
> - Fixed a bug in LongDoubleFloatDoubleVertex's readFields when edges aren't cleared before
deserializing
> - Added new unittests
> -- Replaced TestEdgeListVertex with TestMutableVertex to test all our generic MutableVertex
implementations
> --- Added more serialization tests of different serialization
> -- TestPartitionStores has more testing of unsafe serialization/deserialization
> Testing:
> All unittests pass
> Distributed unittests pass - (except two that also fail in trunk)
> Lots of PageRankBenchmark runs on a cluster
> Benchmark results:
> 25 edges / vertex, 10M vertices, 10 workers
> Trunk
> INFO    2012-11-08 14:43:55,855 [load-0] org.apache.giraph.graph.InputSplitsCallable
 - call: Loaded 1 input splits in 22.475897 secs, (v=1000000, e=25000000) 44492.105 vertices/sec,
1112302.6 edges/sec
> INFO    2012-11-08 14:44:00,411 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep -1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 76580.54187774658M
> INFO    2012-11-08 14:44:05,254 [compute-7] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 2.9732208 secs for 1 partitions on superstep 0.  Flushing started
> INFO    2012-11-08 14:44:11,180 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Superstep 0, messages = 25000000 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem = 74781.9575881958M
> Total (milliseconds)    62,413  0       62,413
> Superstep 3 (milliseconds)      2,417   0       2,417
> Setup (milliseconds)            2,731   0       2,731
> Shutdown (milliseconds)         50      0       50
> Superstep 0 (milliseconds)      10,654  0       10,654
> Input superstep (milliseconds)  27,484  0       27,484
> Superstep 2 (milliseconds)      9,475   0       9,475
> Superstep 1 (milliseconds)      9,599   0       9,599
> Total time of GC in milliseconds        225,052         0       225,052
> 25 edges / vertex, 10M vertices, 10 workers
> SimplePartition + EdgeListVertex (after rebase)
> INFO    2012-11-08 14:33:15,907 [load-0] org.apache.giraph.graph.InputSplitsCallable
 - call: Loaded 1 input splits in 25.431986 secs, (v=1000000, e=25000000) 39320.562 vertices/sec,
983014.06 edges/sec
> INFO    2012-11-08 14:33:17,501 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep -1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 76290.28507995605M
> INFO    2012-11-08 14:33:20,175 [compute-2] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 2.0086238 secs for 1 partitions on superstep 0.  Flushing started
> INFO    2012-11-08 14:33:26,667 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Superstep 0, messages = 25000000 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem = 73716.20901489258M
> Trunk (after rebase)
> Total (milliseconds)    68,113  0       68,113
> Superstep 3 (milliseconds)      2,057   0       2,057
> Setup (milliseconds)            9,765   0       9,765
> Shutdown (milliseconds)         59      0       59
> Superstep 0 (milliseconds)      9,180   0       9,180
> Input superstep (milliseconds)  27,525  0       27,525
> Superstep 2 (milliseconds)      9,600   0       9,600
> Superstep 1 (milliseconds)      9,924   0       9,924
> Total time of GC in milliseconds        216,345         0       216,345
> 250 edges / vertex, 10M vertices, 10 workers
> ByteArrayPartition + UnsafeRepresentativeVertex + reuse vertexdata buffer + unsafe serialization
(after rebase)
> INFO    2012-11-08 14:33:09,822 [load-0] org.apache.giraph.graph.InputSplitsCallable
 - call: Loaded 1 input splits in 9.3217535 secs, (v=1000000, e=25000000) 107275.95 vertices/sec,
2681898.8 edges/sec
> INFO    2012-11-08 14:33:10,900 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep -1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 79974.63636779785M
> INFO    2012-11-08 14:33:13,213 [compute-7] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 1.6110481 secs for 1 partitions on superstep 0.  Flushing started
> INFO    2012-11-08 14:33:13,972 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 0 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 78228.54064941406M
> Total (milliseconds)                    47,061          0       47,061
> Superstep 3 (milliseconds)              2,175           0       2,175
> Setup (milliseconds)                    3,018           0       3,018
> Shutdown (milliseconds)                 1,050           0       1,050
> Superstep 0 (milliseconds)              8,780           0       8,780
> Input superstep (milliseconds)          10,952          0       10,952
> Superstep 2 (milliseconds)              10,450          0       10,450
> Superstep 1 (milliseconds)              10,633          0       10,633
> 250 edges / vertex, 10M vertices, 10 workers
> Trunk
> INFO    2012-11-08 14:46:25,304 [load-0] org.apache.giraph.graph.InputSplitsCallable
 - call: Loaded 1 input splits in 167.02779 secs, (v=1000000, e=250000000) 5987.028 vertices/sec,
1496757.0 edges/sec
> INFO    2012-11-08 14:46:35,558 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep -1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 38447.11888885498M
> INFO    2012-11-08 14:46:52,963 [compute-14] org.apache.giraph.graph.ComputeCallable
 - call: Computation took 16.770031 secs for 1 partitions on superstep 0.  Flushing started
> INFO    2012-11-08 14:46:53,074 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 0 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 24629.869369506836M
> Total (milliseconds)            568,094                                          0  
               568,094
> Superstep 3 (milliseconds)      2,344                                            0  
               2,344
> Setup (milliseconds)            2,748                                            0  
               2,748
> Shutdown (milliseconds)         47                                               0  
               47
> Superstep 0 (milliseconds)      67,853                                           0  
               67,853
> Input superstep (milliseconds)  177,722                                          0  
               177,722
> Superstep 2 (milliseconds)      247,518                                          0  
               247,518
> Superstep 1 (milliseconds)      69,856                                           0  
               69,856
> Total time of GC in milliseconds                                                 2,741,892
         0   2,741,892
> 250 edges / vertex, 10M vertices, 10 workers
> SimplePartition + EdgeListVertex (after rebase)
> INFO    2012-11-08 14:19:57,774 [load-0] org.apache.giraph.graph.InputSplitsCallable
 - call: Loaded 1 input splits in 172.17258 secs, (v=1000000, e=250000000) 5808.126 vertices/sec,
1452031.5 edges/sec
> INFO    2012-11-08 14:20:04,864 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep -1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 37025.9013671875M
> INFO    2012-11-08 14:20:17,453 [compute-6] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 11.959192 secs for 1 partitions on superstep 0.  Flushing started
> INFO    2012-11-08 14:20:17,606 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 0 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 21953.103630065918M
> Total (milliseconds)            470,845                                          0  
               470,845
> Superstep 3 (milliseconds)      2,595                                            0  
               2,595
> Setup (milliseconds)            1,774                                            0  
               1,774
> Shutdown (milliseconds)         54                                               0  
               54
> Superstep 0 (milliseconds)      59,609                                           0  
               59,609
> Input superstep (milliseconds)  179,665                                          0  
               179,665
> Superstep 2 (milliseconds)      165,848                                          0  
               165,848
> Superstep 1 (milliseconds)      61,296                                           0  
               61,296
> Total time of GC in milliseconds                                                 2,480,260
         0   2,480,260
> 250 edges / vertex, 10M vertices, 10 workers
> ByteArrayPartition + UnsafeRepresentativeVertex + reuse vertexdata buffer + unsafe serialization
(after rebase)
> INFO    2012-11-08 13:26:50,334 [load-0] org.apache.giraph.graph.InputSplitsCallable
 - call: Loaded 1 input splits in 69.22095 secs, (v=1000000, e=250000000) 14446.494 vertices/sec,
3611623.5 edges/sec
> INFO    2012-11-08 13:26:52,511 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep -1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 75393.74648284912M
> INFO    2012-11-08 13:27:06,441 [compute-5] org.apache.giraph.graph.ComputeCallable 
- call: Computation took 12.318953 secs for 1 partitions on superstep 0.  Flushing started
> INFO    2012-11-08 13:27:06,483 [main] org.apache.giraph.graph.BspServiceWorker  - finishSuperstep:
Waiting on all requests, superstep 0 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem
= 62303.2106552124M
> Total (milliseconds)            301,720                                          0  
               301,720
> Superstep 3 (milliseconds)      4,759                                            0  
               4,759
> Setup (milliseconds)            2,887                                            0  
               2,887
> Shutdown (milliseconds)         50                                               0  
               50
> Superstep 0 (milliseconds)      72,625                                           0  
               72,625
> Input superstep (milliseconds)  75,797                                           0  
               75,797
> Superstep 2 (milliseconds)      72,245                                           0  
               72,245
> Superstep 1 (milliseconds)      73,353                                           0  
               73,353
> Total time of GC in milliseconds                                                 716,930
           0   716,930

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message