giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: Basic questions about Giraph internals
Date Thu, 06 Feb 2014 11:41:57 GMT
Yes, this is correct.

On 02/06/2014 12:15 PM, Alexander Frolov wrote:
> On Thu, Feb 6, 2014 at 3:00 PM, Claudio Martella <claudio.martella@gmail.com
>> wrote:
>
>>
>>
>>
>> On Thu, Feb 6, 2014 at 11:56 AM, Alexander Frolov <
>> alexndr.frolov@gmail.com> wrote:
>>
>>> Hi Claudio,
>>>
>>> thank you.
>>>
>>> If I understood correctly, mapper and mapper task is the same thing.
>>>
>>
>> More or less. A mapper is a functional element of the programming model,
>> while the mapper task is the task that executes the mapper function on the
>> records.
>>
>
> Ok, I see. Then mapred.tasktracker.map.tasks.maximum is a maximum number of
> Workers [or Workers + Master] which will be created at the same node.
>
> That is if I have 8 node cluster
> with mapred.tasktracker.map.tasks.maximum=4, then I can run up to 31
> Workers + 1 Master.
>
> Is it correct?
>
>
>>
>>>
>>>
>>> On Thu, Feb 6, 2014 at 2:28 PM, Claudio Martella <
>>> claudio.martella@gmail.com> wrote:
>>>
>>>> Hi Alex,
>>>>
>>>> answers are inline.
>>>>
>>>>
>>>> On Thu, Feb 6, 2014 at 11:22 AM, Alexander Frolov <
>>>> alexndr.frolov@gmail.com> wrote:
>>>>
>>>>> Hi, folks!
>>>>>
>>>>> I have started small research of Giraph framework and I have not much
>>>>> experience with Giraph and Hadoop :-(.
>>>>>
>>>>> I would like to ask several questions about how things are working in
>>>>> Giraph which are not straightforward for me. I am trying to use the sources
>>>>> but sometimes it is not too easy ;-)
>>>>>
>>>>> So here they are:
>>>>>
>>>>> 1) How Workers are assigned to TaskTrackers?
>>>>>
>>>>
>>>> Each worker is a mapper, and mapper tasks are assigned to tasktrackers
>>>> by the jobtracker.
>>>>
>>>
>>> That is each Worker is created at the beginning of superstep and then
>>> dies. In the next superstep all Workers are created again. Is it correct?
>>>
>>
>> Nope. The workers are created at the beginning of the computation, and
>> destroyed at the end of the computation. A computation is persistent
>> throughout the computation.
>>
>>
>>>
>>>
>>>> There's no control by Giraph there, and because Giraph doesn't need
>>>> data-locality like Mapreduce does, basically nothing is done.
>>>>
>>>
>>> This is important for me. So Giraph Worker (a.k.a Hadoop mapper) fetches
>>> vertex with corresponding index from the HDFS and perform computation. What
>>> does it do next with it? As I understood Giraph is fully in-memory
>>> framework and in the next superstep this vertex should be fetched from the
>>> memory by the same Worker. Where the vertices are stored between
>>> supersteps? In HDFS or in memory?
>>>
>>
>> As I said, the workers are persistent (in-memory) between supersteps, so
>> they keep everything in memory.
>>
>
> Ok.
>
> Is there any means to see assignment of Workers to TaskTrackers during or
> after the computation?
>
> And is there any means to see assignment of vertices to Workers (as
> distribution function, histogram etc.)?
>
>
>
>>
>>>
>>>
>>>>
>>>>>
>>>>> 2) How vertices are assigned to Workers? Does it depend on distribution
>>>>> of input file on DataNodes? Is there available any choice of distribution
>>>>> politics or no?
>>>>>
>>>>
>>>> In the default scheme, vertices are assigned through modulo hash
>>>> partitioning. Given k workers, vertex v is assigned to worker i according
>>>> to hash(v) % k = i.
>>>>
>>>
>>>>
>>>>>
>>>>> 3) How Workers and Map tasks are related to each other? (1:1)? (n:1)?
>>>>> (1:n)?
>>>>>
>>>>
>>>> It's 1:1. Each worker is implemented by a mapper task. The master is
>>>> usually (but does not need to) implemented by an additional mapper
>>>>
>>> .
>>>>
>>>>
>>>>>
>>>>> 4) Can Workers migrate from one TaskTracker to the other?
>>>>>
>>>>
>>>> Workers does not migrate. A Giraph computation is not dynamic wrt to
>>>> assignment and size of the tasks.
>>>>
>>>
>>>>
>>>>>
>>>>> 5) What is the best way to monitor Giraph app execution (progress,
>>>>> worker assignment, load balancing etc.)?
>>>>>
>>>>
>>>> Just like you would for a standard Mapreduce job. Go to the job page on
>>>> the jobtracker http page.
>>>>
>>>>
>>>>>
>>>>> I think this is all for the moment. Thank you.
>>>>>
>>>>> Testbed description:
>>>>> Hardware: 8 node dual-CPU cluster with IB FDR.
>>>>> Giraph: release-1.0.0-RC2-152-g585511f
>>>>> Hadoop: hadoop-0.20.203.0, hadoop-rdma-0.9.8
>>>>>
>>>>> Best,
>>>>>     Alex
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>     Claudio Martella
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>>     Claudio Martella
>>
>>
>


Mime
View raw message