giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sertu─č Kaya <sertug.k...@agmlab.com>
Subject Re: Basic questions about Giraph internals
Date Fri, 07 Feb 2014 15:17:36 GMT
Hi all;
Thanks for this resourceful Q&A's. I will also definitely try this one 
mapper-multiple thread setting per node.
But Claudio, in which configuration do you set multiple threads?
Thanks
Sertug

On 06-02-2014 16:04, Alexander Frolov wrote:
>
> Claudio,
> thank you very much for your help.
>
> On Thu, Feb 6, 2014 at 4:06 PM, Claudio Martella 
> <claudio.martella@gmail.com <mailto:claudio.martella@gmail.com>> wrote:
>
>
>
>
>     On Thu, Feb 6, 2014 at 12:15 PM, Alexander Frolov
>     <alexndr.frolov@gmail.com <mailto:alexndr.frolov@gmail.com>> wrote:
>
>
>
>
>         On Thu, Feb 6, 2014 at 3:00 PM, Claudio Martella
>         <claudio.martella@gmail.com
>         <mailto:claudio.martella@gmail.com>> wrote:
>
>
>
>
>             On Thu, Feb 6, 2014 at 11:56 AM, Alexander Frolov
>             <alexndr.frolov@gmail.com
>             <mailto:alexndr.frolov@gmail.com>> wrote:
>
>                 Hi Claudio,
>
>                 thank you.
>
>                 If I understood correctly, mapper and mapper task is
>                 the same thing.
>
>
>             More or less. A mapper is a functional element of the
>             programming model, while the mapper task is the task that
>             executes the mapper function on the records.
>
>
>         Ok, I see. Then mapred.tasktracker.map.tasks.maximum is a
>         maximum number of Workers [or Workers + Master] which will be
>         created at the same node.
>
>         That is if I have 8 node cluster
>         with mapred.tasktracker.map.tasks.maximum=4, then I can run up
>         to 31 Workers + 1 Master.
>
>         Is it correct?
>
>
>     That is correct. However, if you have total control over your
>     cluster, you may want to run one worker per node (hence setting
>     the max number of map tasks per machine to 1), and use multiple
>     threads (input, compute, output).
>     This is going to make better use of resources.
>
>
> Should I explicitly force Giraph to use multiple threads for input, 
> compute, output? Only three threads, I suppose? But I have 12 cores 
> available in each node (24 if HT is enabled).
>
>
>
>
>                 On Thu, Feb 6, 2014 at 2:28 PM, Claudio Martella
>                 <claudio.martella@gmail.com
>                 <mailto:claudio.martella@gmail.com>> wrote:
>
>                     Hi Alex,
>
>                     answers are inline.
>
>
>                     On Thu, Feb 6, 2014 at 11:22 AM, Alexander Frolov
>                     <alexndr.frolov@gmail.com
>                     <mailto:alexndr.frolov@gmail.com>> wrote:
>
>                         Hi, folks!
>
>                         I have started small research of Giraph
>                         framework and I have not much experience with
>                         Giraph and Hadoop :-(.
>
>                         I would like to ask several questions about
>                         how things are working in Giraph which are not
>                         straightforward for me. I am trying to use the
>                         sources but sometimes it is not too easy ;-)
>
>                         So here they are:
>
>                         1) How Workers are assigned to TaskTrackers?
>
>
>                     Each worker is a mapper, and mapper tasks are
>                     assigned to tasktrackers by the jobtracker.
>
>
>                 That is each Worker is created at the beginning of
>                 superstep and then dies. In the next superstep all
>                 Workers are created again. Is it correct?
>
>
>             Nope. The workers are created at the beginning of the
>             computation, and destroyed at the end of the computation.
>             A computation is persistent throughout the computation.
>
>                     There's no control by Giraph there, and because
>                     Giraph doesn't need data-locality like Mapreduce
>                     does, basically nothing is done.
>
>
>                 This is important for me. So Giraph Worker (a.k.a
>                 Hadoop mapper) fetches vertex with corresponding index
>                 from the HDFS and perform computation. What does it do
>                 next with it? As I understood Giraph is fully
>                 in-memory framework and in the next superstep this
>                 vertex should be fetched from the memory by the same
>                 Worker. Where the vertices are stored between
>                 supersteps? In HDFS or in memory?
>
>
>             As I said, the workers are persistent (in-memory) between
>             supersteps, so they keep everything in memory.
>
>
>         Ok.
>
>         Is there any means to see assignment of Workers to
>         TaskTrackers during or after the computation?
>
>
>     The jobtracker http interface will show you the mapper running,
>     hence i'd check there
>
>
>         And is there any means to see assignment of vertices to
>         Workers (as distribution function, histogram etc.)?
>
>
>     You can check the worker logs, I think the information should be
>     there.
>
>
>
>
>
>
>                         2) How vertices are assigned to Workers? Does
>                         it depend on distribution of input file on
>                         DataNodes? Is there available any choice of
>                         distribution politics or no?
>
>
>                     In the default scheme, vertices are assigned
>                     through modulo hash partitioning. Given k workers,
>                     vertex v is assigned to worker i according to
>                     hash(v) % k = i.
>
>
>                         3) How Workers and Map tasks are related to
>                         each other? (1:1)? (n:1)? (1:n)?
>
>
>                     It's 1:1. Each worker is implemented by a mapper
>                     task. The master is usually (but does not need to)
>                     implemented by an additional mapper
>
>                     .
>
>
>                         4) Can Workers migrate from one TaskTracker to
>                         the other?
>
>
>                     Workers does not migrate. A Giraph computation is
>                     not dynamic wrt to assignment and size of the tasks.
>
>
>                         5) What is the best way to monitor Giraph app
>                         execution (progress, worker assignment, load
>                         balancing etc.)?
>
>
>                     Just like you would for a standard Mapreduce job.
>                     Go to the job page on the jobtracker http page.
>
>
>                         I think this is all for the moment. Thank you.
>
>                         Testbed description:
>                         Hardware: 8 node dual-CPU cluster with IB FDR.
>                         Giraph: release-1.0.0-RC2-152-g585511f
>                         Hadoop: hadoop-0.20.203.0
>                         <tel:0.20.203.0>, hadoop-rdma-0.9.8
>
>                         Best,
>                            Alex
>
>
>
>
>                     -- 
>                        Claudio Martella
>
>
>
>
>
>             -- 
>                Claudio Martella
>
>
>
>
>
>     -- 
>        Claudio Martella
>
>


Mime
View raw message