giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching" <avery.ch...@gmail.com>
Subject Re: Review Request: Out-of-core messages
Date Sat, 04 Aug 2012 09:31:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6013/#review9845
-----------------------------------------------------------


Couple of minor things.  Looks pretty good to me though.  Will be a lot cleaner once Hadoop
RPC is gone!


http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java
<https://reviews.apache.org/r/6013/#comment20890>

    So the first request is for the vertices and the second one is for messages...



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java
<https://reviews.apache.org/r/6013/#comment20889>

    Why is this check here?  We are using netty if we got here.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java
<https://reviews.apache.org/r/6013/#comment20891>

    Would be good to generally try to avoid line breaks on '.' whenever possible.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
<https://reviews.apache.org/r/6013/#comment20892>

    Would be good to generally try to avoid line breaks on '.' whenever possible.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
<https://reviews.apache.org/r/6013/#comment20893>

    Please try to keep <I, M> on the same line.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
<https://reviews.apache.org/r/6013/#comment20895>

    Nice to get rid of this!



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/RPCCommunications.java
<https://reviews.apache.org/r/6013/#comment20896>

    Should throw an exception here, right?  Shouldn't ever happen.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java
<https://reviews.apache.org/r/6013/#comment20900>

    map is probably not a great name.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStoreByPartition.java
<https://reviews.apache.org/r/6013/#comment20902>

    prefer split on = rather than .



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStoreByPartition.java
<https://reviews.apache.org/r/6013/#comment20901>

    please join '.' and split on : instead.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/6013/#comment20903>

    another break on '.'



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
<https://reviews.apache.org/r/6013/#comment20904>

    Wow, can't wait to get rid of Hadoop RPC.


- Avery Ching


On Aug. 1, 2012, 7:25 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6013/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2012, 7:25 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> This patch introduces out-of-core messages support for Giraph. Some ideas are taken from
discussion in https://issues.apache.org/jira/browse/GIRAPH-45. 
> 
> We keep messages in MessageStore, from which they will be flushed to disk when necessary.
Messages are separated by partition. In moments we only flush single partition to disk so
we would still keep things in memory in case it's time for next superstep. When flushing to
disk, we write in the following format:
> numberOfVertices
> vertexId1 numberOfMessages1 messagesForVertexId1
> vertexId2 numberOfMessages2 messagesForVertexId2
> ...
> Vertex ids are sorted. We don't require enough memory to fit all the messages for partition,
but we require that messages for a single vertex fit in memory. 
> In the end we potentially have several files for each partition. When reading messages,
all the files are read sequentially.
> 
> DiskBackedMessageStoreByPartition handles all messages, DiskBackedMessageStore is then
used for a single partition, and SequentialFileMessageStore handles single file.
> There is also SimpleMessageStore which doesn't use disk at all.
> 
> Options available to user:
> - whether or not to use out-of-core messaging
> - number of messages to keep in memory - this should probably be changed (explained below)
> - size of buffer when reading from and writing to disk
> 
> ServerData now has two instances of message stores: one which is consumed in current
superstep with messages from previous superstep, and one in which it will keep incoming messages
for next superstep.
> 
> Other things which had to be changed:
> - Checkpointing - since messages are not kept in the vertex anymore, they need to be
stored separately.
> - Partition exchange between workers - same reasons as above - added SendMessagesRequest
> - Messages are not assigned to vertex, they are just passed in compute
> - compute methods are now executed in order of vertex id inside of partition, so we could
have fast reading from disk
> 
> For memory check I only have the number of messages which I allow in memory. This should
be done better, but there is a problem since Alessandro's patch for out-of-core graph also
has memory checks. We don't want one of those parts to use all the memory and leave too little
space for the other, but I'm not aware of a way to separately check memory usage of different
data structures.
> 
> I didn't integrate this with RPC, that's why there are some checks for useNetty, those
can be removed once the RPC is removed. Also, since vertex doesn't keep messages in itself
anymore, once RPC is removed we should also remove getMessages/putMessages/getNumMessages
from vertex, change initialize to (id, value, edges, hasMessages) and just give messages to
vertex when calling compute.
> 
> I'll fix the part when partitions are sent around before superstep, since that's the
only part now which requires that all the messages for single partition fit in memory.
> 
> 
> This addresses bug GIRAPH-45.
>     https://issues.apache.org/jira/browse/GIRAPH-45
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyServer.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/RPCCommunications.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestRegistry.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMessagesRequest.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/BasicMessageStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStoreByPartition.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/FlushableMessageStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/MessageStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/MessageStoreByPartition.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/MessageStoreFactory.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/SendPartitionCurrentMessagesRequest.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/SequentialFileMessageStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/messages/package-info.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/utils/CollectionUtils.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
1366303 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/TestMessageStores.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/utils/MockUtils.java
1366303 
> 
> Diff: https://reviews.apache.org/r/6013/diff/
> 
> 
> Testing
> -------
> 
> Run mvn verify and tests in pseudo-distributed mode, all apart from this one https://issues.apache.org/jira/browse/GIRAPH-259
pass.
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message