giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Review Request: Out-of-core messages
Date Tue, 24 Jul 2012 00:45:48 GMT
We should integrate the partitioning of the graph into the input 
superstep to get locality as well.  We can use MapReduce to try and 
schedule the map tasks (workers) closest to its data and then make the 
workers smart enough to only try to load their partitions.

On 7/22/12 4:30 PM, Claudio Martella wrote:
> I see your reasoning. In general I'm being open to use MR when
> necessary (e.g. i used to propose it instead of the automatic vertex
> creation), here it could get tricky. I can see additional HDFS usage
> as down (you have to be able to store 2x the graph). However, once the
> graph is pre-filtered, this additional stage would not be necessary
> again for the successive jobs (only when a different number of workers
> is used). Though, it does add a not so small passage to the process.
>
> On Sun, Jul 22, 2012 at 10:49 PM, Alessandro Presta <alessandro@fb.com> wrote:
>> Exactly. On paper, the amount of data around should be the same as during
>> the computation, but in practice we do use a lot more memory.
>> You can play with the settings and just push the problem a little farther
>> away, by caching less and flushing requests more frequently, so then the
>> bottleneck is on the servers.
>> We're basically sending (k-1)/k of the graph through the network, where k
>> is the number of workers.
>>
>> What I'm thinking is that in INPUT_SUPERSTEP we're doing what MapReduce is
>> really good at (sorting and aggregating) in a probably inefficient (or at
>> least non-scalable) way.
>> We could try implementing it with a MapReduce job instead, where the
>> mappers take input splits and emit (partition_id, vertex) (they would have
>> access to the partitioner) and reducers just output the built partitions
>> to HDFS.
>> The computation stage would then be the usual Giraph job, where each
>> worker knows where to get its partitions from HDFS.
>> I can try making this change and see how it goes. It would just be one MR
>> job, so we're not selling our souls to iterative MR.
>>
>> I can also see many cases where one might not want to shuffle vertices
>> around at all: each worker reads a roughly equal part of the input (forget
>> about bigger vertices for now) and simply communicates its own vertex ids
>> to the master. Partition "a posteriori" instead of "a priori".
>>
>> What do you think?
>>
>> On 7/20/12 9:42 PM, "Eli Reisman" <initialcontext@gmail.com> wrote:
>>
>>> What we are seeing in the metrics is the three-way load of
>>>
>>> 1. reading InputSplits from HDFS (mostly over the wire as there is no
>>> locality right now)
>>> 2. creating temporary collections of vertices, sending them on netty
>>> 3. simultaneously receiving collections of vertices on netty from remote
>>> nodes that will be place in the local workers' partitions for processing
>>> stages
>
>


Mime
View raw message