giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maja Kabiljo" <majakabi...@fb.com>
Subject Re: Review Request: Move part of the graph out-of-core when memory is low
Date Wed, 22 Aug 2012 08:55:41 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
-----------------------------------------------------------


Alessandro, can you please try the same test with a lot of vertices (you only had 10 vertices
here)? If vertices are visited in random order in the prepareSuperstep, I think you should
have much more disk operations.

> Also, addPartition() would then have to read all vertex ids even when the partition is
in memory, which would make it way slower in the standard use case.
What do you mean by this?


http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
<https://reviews.apache.org/r/5987/#comment22714>

    You could end up with more than maxInMemoryPartitions.


- Maja Kabiljo


On Aug. 21, 2012, 11:02 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 11:02 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced
both the temporary partitions and the worker partition map with a common data structure, PartitionStore,
held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that
we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory,
and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which
makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held
in memory only requires a read-lock (since Partition is thread-safe), while creating a new
partition, moving it in and out of core or appending vertices requires a write-lock (we can't
have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows
what code we get rid of) instead of working around it. I suppose the Netty security patch
will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java
PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message