giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: Review Request: Out-of-core messages
Date Fri, 24 Aug 2012 18:23:38 GMT
Hi Maja,

I didn't want to minimize your effort which is great, I just wanted to
underline the complete overlap between the previous plan and your
work. The designed had moved from BTrees+Bloomfilters towards sorted
partitions, as in the same discussion (17/Dec/11 15:28) and as
presented at the Berlin workshop on the 6th of june
(http://prezi.com/ecdgiav4oeex/bb-apache-giraph-distributed-graph-processing-in-the-cloud/).
It really looks to me like your contribution is one of the best ways
to tackle the problem.

Best,
Claudio

On Wed, Aug 1, 2012 at 8:43 PM, Maja Kabiljo <majakabiljo@fb.com> wrote:
> Another thing, I think I should explain what from GIRAPH-45 discussion am
> I actually using here, since I don't use bloomfilters and BTrees. The way
> it works is the following:
> - Inside the outer message store we have message stores for each of the
> partitions separately.
> - Partition message stores keep data in ordered map (ordered by vertex id).
> - In outer messages store we check if we should flush something (do we
> have more than allowed number of messages in memory). While we do, we
> flush the partition with largest number of messages in memory.
> - When partition messages store is flushed, all the data is written to a
> file in the order of vertex ids, file content is like:
> num_vertices
> vertex_1_id num_messages_1 message_1_1 message_1_2 ...
> vertex_2_id num_messages_2 message_2_1 message_2_2 ...
> ...
> - In the end each partition will have some messages in memory, and N
> files, where N is the number of times it was flushed.
> - When it's time to do the computation, within a single partition we call
> compute methods in order of vertex ids.
> - We use buffered streams and read data from all partition files
> sequentially, since we'll need data in the same order it's written in each
> of the files. This way we limit number of random file accesses.



-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message