giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tu, Min" <>
Subject General Scalability Questions for Giraph
Date Thu, 14 Feb 2013 22:50:34 GMT

I have some general scalability questions for Giraph. Based on the Giraph design, I am assuming
all the mappers in giraph job should be running at the same time.

If so, then

  1.  The max mappers for giraph job <= total mapper slots in the whole cluster
  2.  The max data input size to giraph should be <= total mapper slots * mapper memory
  3.  If the total mapper slot in the cluster is 200 and only 100 mappers is currently available,
and the giraph job require 150 mappers
     *   Without any configuration change, the 100 mappers of the giraph will be started but
the giraph job will NOT run successfully
     *   Is there any configuration in Giraph to start the job ONLY at them time when  all
the mapper slot available?
  4.  How is the scalability in giraph? I can ONLY run up to 150 mappers for my giraph job.
Does anyone run a large giraph job in large cluster successfully?
     *   I am using giraph 0.1 in my cluster

Thanks a lot for your time and inputs.


View raw message