giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chadi jaber <>
Subject RE: Optimal number of Workers
Date Wed, 16 Apr 2014 10:46:07 GMT
Thanks !! it's clear now

Subject: RE: Optimal number of Workers
Date: Wed, 16 Apr 2014 13:58:15 +0530

Giraph uses threads for compute, netty server, netty client on workers, execution pools, input,
output etc.You can see most of these options in org.apache.giraph.conf.GiraphConstants for
  /** Netty client threads */  IntConfOption NETTY_CLIENT_THREADS =      new IntConfOption("giraph.nettyClientThreads",
4, "Netty client threads");
  /** Netty server threads */  IntConfOption NETTY_SERVER_THREADS =      new IntConfOption("giraph.nettyServerThreads",
16,          "Netty server threads");
  /** Number of threads for vertex computation */  IntConfOption NUM_COMPUTE_THREADS =   
  new IntConfOption("giraph.numComputeThreads", 1,          "Number of threads for vertex
  /** Number of threads for input split loading */  IntConfOption NUM_INPUT_THREADS =    
 new IntConfOption("giraph.numInputThreads", 1,          "Number of threads for input split

The idea is that if you run your job in a cluster of 5 machines: typically 1 machine is the
master & 4 of them are "workers" which load the graph & compute on it. Each worker
is a separate machine and to maximize its utilization we can use as many threads as it can
However, if you are running it in pseudo mode then all workers run on the same machine &
still try to launch the number of threads (default set in the config) - though each worker
is now a thread (instead of a machine) it still launches all these other threads unscrupulously.
Anyway, u can configure these threads spawned by workers to reduce the over all number of
threads launched in your one machine.
Subject: Optimal number of Workers
Date: Tue, 15 Apr 2014 13:34:53 +0200

Hello !!Can anybody explain how threads are used by worker in Giraph ? for which purposes
? how the number of thread to use is determined by worker?
I often have the following error :org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError:
unable to create new native thread.
A check on the number of thread by worker gives child processes with 100 threads by worker
process (10 workers in a 12 processors machine), which is in my opinion too large isn't it
?if i reduce the number of workers , the number of threads decreases. How must we choose the
number of workers?
Thanks in advance.Chadi

View raw message