giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Frolov <alexndr.fro...@gmail.com>
Subject Re: Basic questions about Giraph internals
Date Thu, 06 Feb 2014 10:56:02 GMT
Hi Claudio,

thank you.

If I understood correctly, mapper and mapper task is the same thing.


On Thu, Feb 6, 2014 at 2:28 PM, Claudio Martella <claudio.martella@gmail.com
> wrote:

> Hi Alex,
>
> answers are inline.
>
>
> On Thu, Feb 6, 2014 at 11:22 AM, Alexander Frolov <
> alexndr.frolov@gmail.com> wrote:
>
>> Hi, folks!
>>
>> I have started small research of Giraph framework and I have not much
>> experience with Giraph and Hadoop :-(.
>>
>> I would like to ask several questions about how things are working in
>> Giraph which are not straightforward for me. I am trying to use the sources
>> but sometimes it is not too easy ;-)
>>
>> So here they are:
>>
>> 1) How Workers are assigned to TaskTrackers?
>>
>
> Each worker is a mapper, and mapper tasks are assigned to tasktrackers by
> the jobtracker.
>

That is each Worker is created at the beginning of superstep and then dies.
In the next superstep all Workers are created again. Is it correct?


> There's no control by Giraph there, and because Giraph doesn't need
> data-locality like Mapreduce does, basically nothing is done.
>

This is important for me. So Giraph Worker (a.k.a Hadoop mapper) fetches
vertex with corresponding index from the HDFS and perform computation. What
does it do next with it? As I understood Giraph is fully in-memory
framework and in the next superstep this vertex should be fetched from the
memory by the same Worker. Where the vertices are stored between
supersteps? In HDFS or in memory?


>
>>
>> 2) How vertices are assigned to Workers? Does it depend on distribution
>> of input file on DataNodes? Is there available any choice of distribution
>> politics or no?
>>
>
> In the default scheme, vertices are assigned through modulo hash
> partitioning. Given k workers, vertex v is assigned to worker i according
> to hash(v) % k = i.
>

>
>>
>> 3) How Workers and Map tasks are related to each other? (1:1)? (n:1)?
>> (1:n)?
>>
>
> It's 1:1. Each worker is implemented by a mapper task. The master is
> usually (but does not need to) implemented by an additional mapper
>
.
>
>
>>
>> 4) Can Workers migrate from one TaskTracker to the other?
>>
>
> Workers does not migrate. A Giraph computation is not dynamic wrt to
> assignment and size of the tasks.
>

>
>>
>> 5) What is the best way to monitor Giraph app execution (progress, worker
>> assignment, load balancing etc.)?
>>
>
> Just like you would for a standard Mapreduce job. Go to the job page on
> the jobtracker http page.
>
>
>>
>> I think this is all for the moment. Thank you.
>>
>> Testbed description:
>> Hardware: 8 node dual-CPU cluster with IB FDR.
>> Giraph: release-1.0.0-RC2-152-g585511f
>> Hadoop: hadoop-0.20.203.0, hadoop-rdma-0.9.8
>>
>> Best,
>>    Alex
>>
>
>
>
> --
>    Claudio Martella
>
>

Mime
View raw message