giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Claudio Martella" <claudio.marte...@gmail.com>
Subject Re: Review Request: GIRAPH-461: Convert static assignment of in-memory partitions with LRU cache
Date Sun, 03 Feb 2013 14:54:42 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9278/
-----------------------------------------------------------

(Updated Feb. 3, 2013, 2:54 p.m.)


Review request for giraph.


Description
-------

Currently, the out-of-core partitions are assigned to memory or to disk statically. Using
an LRU cache should help keeping in-memory only the partitions that are actively accessed,
given a job that does not access all the graph at each superstep (traversals) and a good data
partitioning (non random).


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 30d4462 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java e2866fd 
  giraph-core/src/main/java/org/apache/giraph/graph/ComputeCallable.java 042fd47 
  giraph-core/src/main/java/org/apache/giraph/partition/DiskBackedPartitionStore.java 09e5d75

  giraph-core/src/main/java/org/apache/giraph/partition/PartitionStore.java 3e8dda9 
  giraph-core/src/main/java/org/apache/giraph/partition/SimplePartitionStore.java 7bd0bb1

  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java f542344 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java 7187928 
  giraph-core/src/test/java/org/apache/giraph/partition/TestPartitionStores.java b02ed3a 

Diff: https://reviews.apache.org/r/9278/diff/


Testing
-------

passes mvn verify. 

hadoop jar giraph-0.2-SNAPSHOT-for-hadoop-1.0.2-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark
-w 60 -c 2 -e 100 -V 10000000 -v -s 10

trunk:
13/01/29 20:40:53 INFO mapred.JobClient:   Giraph Timers
13/01/29 20:40:53 INFO mapred.JobClient:     Total (milliseconds)=492403
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 3 (milliseconds)=40243
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 4 (milliseconds)=45430
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 10 (milliseconds)=713
13/01/29 20:40:53 INFO mapred.JobClient:     Setup (milliseconds)=20832
13/01/29 20:40:53 INFO mapred.JobClient:     Shutdown (milliseconds)=56
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 7 (milliseconds)=36753
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 9 (milliseconds)=36363
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 0 (milliseconds)=39558
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 8 (milliseconds)=44548
13/01/29 20:40:53 INFO mapred.JobClient:     Input superstep (milliseconds)=59184
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 6 (milliseconds)=40777
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 5 (milliseconds)=43962
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 2 (milliseconds)=37325
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 1 (milliseconds)=46655
13/01/29 20:40:53 INFO mapred.JobClient:   Giraph Stats
13/01/29 20:40:53 INFO mapred.JobClient:     Aggregate edges=1000000000
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep=11
13/01/29 20:40:53 INFO mapred.JobClient:     Last checkpointed superstep=0
13/01/29 20:40:53 INFO mapred.JobClient:     Current workers=60
13/01/29 20:40:53 INFO mapred.JobClient:     Current master task partition=0
13/01/29 20:40:53 INFO mapred.JobClient:     Sent messages=0
13/01/29 20:40:53 INFO mapred.JobClient:     Aggregate finished vertices=10000000
13/01/29 20:40:53 INFO mapred.JobClient:     Aggregate vertices=10000000
13/01/29 20:40:53 INFO mapred.JobClient:   File Output Format Counters 
13/01/29 20:40:53 INFO mapred.JobClient:     Bytes Written=0
13/01/29 20:40:53 INFO mapred.JobClient:   FileSystemCounters
13/01/29 20:40:53 INFO mapred.JobClient:     HDFS_BYTES_READ=2684
13/01/29 20:40:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1388228
13/01/29 20:40:53 INFO mapred.JobClient:   File Input Format Counters 
13/01/29 20:40:53 INFO mapred.JobClient:     Bytes Read=0
13/01/29 20:40:53 INFO mapred.JobClient:   Map-Reduce Framework
13/01/29 20:40:53 INFO mapred.JobClient:     Map input records=61
13/01/29 20:40:53 INFO mapred.JobClient:     Physical memory (bytes) snapshot=71703965696
13/01/29 20:40:53 INFO mapred.JobClient:     Spilled Records=0
13/01/29 20:40:53 INFO mapred.JobClient:     CPU time spent (ms)=15141630
13/01/29 20:40:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=58151337984
13/01/29 20:40:53 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=371313995776
13/01/29 20:40:53 INFO mapred.JobClient:     Map output records=0
13/01/29 20:40:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2684

GIRAPH-439:
in memory:
13/01/29 19:35:53 INFO mapred.JobClient:   Giraph Timers
13/01/29 19:35:53 INFO mapred.JobClient:     Total (milliseconds)=427511
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 3 (milliseconds)=37341
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 4 (milliseconds)=35458
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 10 (milliseconds)=852
13/01/29 19:35:53 INFO mapred.JobClient:     Setup (milliseconds)=24825
13/01/29 19:35:53 INFO mapred.JobClient:     Shutdown (milliseconds)=50
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 7 (milliseconds)=37557
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 9 (milliseconds)=33961
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 0 (milliseconds)=33048
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 8 (milliseconds)=36345
13/01/29 19:35:53 INFO mapred.JobClient:     Input superstep (milliseconds)=44420
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 6 (milliseconds)=33635
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 5 (milliseconds)=41885
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 2 (milliseconds)=35046
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 1 (milliseconds)=33083
13/01/29 19:35:53 INFO mapred.JobClient:   Giraph Stats
13/01/29 19:35:53 INFO mapred.JobClient:     Aggregate edges=1000000000
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep=11
13/01/29 19:35:53 INFO mapred.JobClient:     Last checkpointed superstep=0
13/01/29 19:35:53 INFO mapred.JobClient:     Current workers=60
13/01/29 19:35:53 INFO mapred.JobClient:     Current master task partition=0
13/01/29 19:35:53 INFO mapred.JobClient:     Sent messages=0
13/01/29 19:35:53 INFO mapred.JobClient:     Aggregate finished vertices=10000000
13/01/29 19:35:53 INFO mapred.JobClient:     Aggregate vertices=10000000
13/01/29 19:35:53 INFO mapred.JobClient:   File Output Format Counters 
13/01/29 19:35:53 INFO mapred.JobClient:     Bytes Written=0
13/01/29 19:35:53 INFO mapred.JobClient:   FileSystemCounters
13/01/29 19:35:53 INFO mapred.JobClient:     HDFS_BYTES_READ=2684
13/01/29 19:35:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1388228
13/01/29 19:35:53 INFO mapred.JobClient:   File Input Format Counters 
13/01/29 19:35:53 INFO mapred.JobClient:     Bytes Read=0
13/01/29 19:35:53 INFO mapred.JobClient:   Map-Reduce Framework
13/01/29 19:35:53 INFO mapred.JobClient:     Map input records=61
13/01/29 19:35:53 INFO mapred.JobClient:     Physical memory (bytes) snapshot=71627419648
13/01/29 19:35:53 INFO mapred.JobClient:     Spilled Records=0
13/01/29 19:35:53 INFO mapred.JobClient:     CPU time spent (ms)=15020990
13/01/29 19:35:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=57611911168
13/01/29 19:35:53 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=371123154944
13/01/29 19:35:53 INFO mapred.JobClient:     Map output records=0
13/01/29 19:35:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2684

ooh graph (2 partitions in memory out of 49):
13/01/29 19:54:57 INFO mapred.JobClient:   Giraph Timers
13/01/29 19:54:57 INFO mapred.JobClient:     Total (milliseconds)=508004
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 3 (milliseconds)=38085
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 4 (milliseconds)=40789
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 10 (milliseconds)=811
13/01/29 19:54:57 INFO mapred.JobClient:     Setup (milliseconds)=25612
13/01/29 19:54:57 INFO mapred.JobClient:     Shutdown (milliseconds)=699
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 7 (milliseconds)=44806
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 9 (milliseconds)=41873
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 0 (milliseconds)=46329
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 8 (milliseconds)=46272
13/01/29 19:54:57 INFO mapred.JobClient:     Input superstep (milliseconds)=52395
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 6 (milliseconds)=44337
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 5 (milliseconds)=39379
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 2 (milliseconds)=40452
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 1 (milliseconds)=46155
13/01/29 19:54:57 INFO mapred.JobClient:   Giraph Stats
13/01/29 19:54:57 INFO mapred.JobClient:     Aggregate edges=1000000000
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep=11
13/01/29 19:54:57 INFO mapred.JobClient:     Last checkpointed superstep=0
13/01/29 19:54:57 INFO mapred.JobClient:     Current workers=60
13/01/29 19:54:57 INFO mapred.JobClient:     Current master task partition=0
13/01/29 19:54:57 INFO mapred.JobClient:     Sent messages=0
13/01/29 19:54:57 INFO mapred.JobClient:     Aggregate finished vertices=10000000
13/01/29 19:54:57 INFO mapred.JobClient:     Aggregate vertices=10000000
13/01/29 19:54:57 INFO mapred.JobClient:   File Output Format Counters 
13/01/29 19:54:57 INFO mapred.JobClient:     Bytes Written=0
13/01/29 19:54:57 INFO mapred.JobClient:   FileSystemCounters
13/01/29 19:54:57 INFO mapred.JobClient:     HDFS_BYTES_READ=2684
13/01/29 19:54:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1388228
13/01/29 19:54:57 INFO mapred.JobClient:   File Input Format Counters 
13/01/29 19:54:57 INFO mapred.JobClient:     Bytes Read=0
13/01/29 19:54:57 INFO mapred.JobClient:   Map-Reduce Framework
13/01/29 19:54:57 INFO mapred.JobClient:     Map input records=61
13/01/29 19:54:57 INFO mapred.JobClient:     Physical memory (bytes) snapshot=71368736768
13/01/29 19:54:57 INFO mapred.JobClient:     Spilled Records=0
13/01/29 19:54:57 INFO mapred.JobClient:     CPU time spent (ms)=15289390
13/01/29 19:54:57 INFO mapred.JobClient:     Total committed heap usage (bytes)=57278595072
13/01/29 19:54:57 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=370911342592
13/01/29 19:54:57 INFO mapred.JobClient:     Map output records=0
13/01/29 19:54:57 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2684

in memory (2 compute threads per worker):
13/01/29 20:30:49 INFO mapred.JobClient:   Giraph Timers
13/01/29 20:30:49 INFO mapred.JobClient:     Total (milliseconds)=487379
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 3 (milliseconds)=46092
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 4 (milliseconds)=44840
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 10 (milliseconds)=745
13/01/29 20:30:49 INFO mapred.JobClient:     Setup (milliseconds)=23013
13/01/29 20:30:49 INFO mapred.JobClient:     Shutdown (milliseconds)=126
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 7 (milliseconds)=40620
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 9 (milliseconds)=39630
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 0 (milliseconds)=38221
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 8 (milliseconds)=40406
13/01/29 20:30:49 INFO mapred.JobClient:     Input superstep (milliseconds)=49762
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 6 (milliseconds)=45054
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 5 (milliseconds)=40220
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 2 (milliseconds)=40817
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 1 (milliseconds)=37830
13/01/29 20:30:49 INFO mapred.JobClient:   Giraph Stats
13/01/29 20:30:49 INFO mapred.JobClient:     Aggregate edges=1000000000
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep=11
13/01/29 20:30:49 INFO mapred.JobClient:     Last checkpointed superstep=0
13/01/29 20:30:49 INFO mapred.JobClient:     Current workers=60
13/01/29 20:30:49 INFO mapred.JobClient:     Current master task partition=0
13/01/29 20:30:49 INFO mapred.JobClient:     Sent messages=0
13/01/29 20:30:49 INFO mapred.JobClient:     Aggregate finished vertices=10000000
13/01/29 20:30:49 INFO mapred.JobClient:     Aggregate vertices=10000000
13/01/29 20:30:49 INFO mapred.JobClient:   File Output Format Counters 
13/01/29 20:30:49 INFO mapred.JobClient:     Bytes Written=0
13/01/29 20:30:49 INFO mapred.JobClient:   FileSystemCounters
13/01/29 20:30:49 INFO mapred.JobClient:     HDFS_BYTES_READ=2684
13/01/29 20:30:49 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1388228
13/01/29 20:30:49 INFO mapred.JobClient:   File Input Format Counters 
13/01/29 20:30:49 INFO mapred.JobClient:     Bytes Read=0
13/01/29 20:30:49 INFO mapred.JobClient:   Map-Reduce Framework
13/01/29 20:30:49 INFO mapred.JobClient:     Map input records=61
13/01/29 20:30:49 INFO mapred.JobClient:     Physical memory (bytes) snapshot=71895678976
13/01/29 20:30:49 INFO mapred.JobClient:     Spilled Records=0
13/01/29 20:30:49 INFO mapred.JobClient:     CPU time spent (ms)=15134650
13/01/29 20:30:49 INFO mapred.JobClient:     Total committed heap usage (bytes)=57982255104
13/01/29 20:30:49 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=371448213504
13/01/29 20:30:49 INFO mapred.JobClient:     Map output records=0
13/01/29 20:30:49 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2684

ooh graph (2 partitions in memory out of 49, 2 compute threads per worker):
13/01/29 20:11:28 INFO mapred.JobClient:   Giraph Timers
13/01/29 20:11:28 INFO mapred.JobClient:     Total (milliseconds)=506380
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 3 (milliseconds)=41677
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 4 (milliseconds)=41285
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 10 (milliseconds)=764
13/01/29 20:11:28 INFO mapred.JobClient:     Setup (milliseconds)=24574
13/01/29 20:11:28 INFO mapred.JobClient:     Shutdown (milliseconds)=82
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 7 (milliseconds)=43183
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 9 (milliseconds)=46654
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 0 (milliseconds)=50955
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 8 (milliseconds)=40413
13/01/29 20:11:28 INFO mapred.JobClient:     Input superstep (milliseconds)=43584
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 6 (milliseconds)=46638
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 5 (milliseconds)=46107
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 2 (milliseconds)=39321
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 1 (milliseconds)=41139
13/01/29 20:11:28 INFO mapred.JobClient:   Giraph Stats
13/01/29 20:11:28 INFO mapred.JobClient:     Aggregate edges=1000000000
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep=11
13/01/29 20:11:28 INFO mapred.JobClient:     Last checkpointed superstep=0
13/01/29 20:11:28 INFO mapred.JobClient:     Current workers=60
13/01/29 20:11:28 INFO mapred.JobClient:     Current master task partition=0
13/01/29 20:11:28 INFO mapred.JobClient:     Sent messages=0
13/01/29 20:11:28 INFO mapred.JobClient:     Aggregate finished vertices=10000000
13/01/29 20:11:28 INFO mapred.JobClient:     Aggregate vertices=10000000
13/01/29 20:11:28 INFO mapred.JobClient:   File Output Format Counters 
13/01/29 20:11:28 INFO mapred.JobClient:     Bytes Written=0
13/01/29 20:11:28 INFO mapred.JobClient:   FileSystemCounters
13/01/29 20:11:28 INFO mapred.JobClient:     HDFS_BYTES_READ=2684
13/01/29 20:11:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1388228
13/01/29 20:11:28 INFO mapred.JobClient:   File Input Format Counters 
13/01/29 20:11:28 INFO mapred.JobClient:     Bytes Read=0
13/01/29 20:11:28 INFO mapred.JobClient:   Map-Reduce Framework
13/01/29 20:11:28 INFO mapred.JobClient:     Map input records=61
13/01/29 20:11:28 INFO mapred.JobClient:     Physical memory (bytes) snapshot=71620620288
13/01/29 20:11:28 INFO mapred.JobClient:     Spilled Records=0
13/01/29 20:11:28 INFO mapred.JobClient:     CPU time spent (ms)=15279810
13/01/29 20:11:28 INFO mapred.JobClient:     Total committed heap usage (bytes)=57294782464
13/01/29 20:11:28 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=370988941312
13/01/29 20:11:28 INFO mapred.JobClient:     Map output records=0
13/01/29 20:11:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2684


Thanks,

Claudio Martella


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message