giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Nalezenec <>
Subject Re: Number of concurrent workers
Date Tue, 27 Jan 2015 09:10:52 GMT
On 23.1.2015 00:40, Walaa Eldin Moustafa wrote:
> Hi,
> I am experimenting with a memory-intensive Giraph application on top 
> of a large graph (50 million nodes), on a 14 node cluster.
> When setting the number of workers to a large number (500 in this 
> example), I get errors for not being able to fulfill the number of 
> requested workers (Please see the log excerpt below). To my 
> understanding, this contradicts with how Yarn/MR map tasks operate, as 
> if the number of map tasks is more than what is currently available in 
> terms of resources, only a subset of the maps are started, and new 
> ones are assigned as new slots become available. In other words, as 
> many map tasks as possible can run concurrently, and new ones are run 
> as resources become available. Is not this the case with Giraph 
> workers? I expect it to be the case, since workers are basically map 
> tasks, so the same should apply to them. However, the log below 
> suggests otherwise, as based on my resources, 37 map tasks (workers) 
> could be created, but the application could not proceed without 
> creating all the 500 workers. Could you please help explaining what is 
> causing this?

Giraph is not standard M/R job. It needs all Mappers to run in same 
moment. No computation is started before all mapper are running.
Its hard to tell it does not work. I guess you have already raised 
timetout. Check if there is enough slots in queue where jobs is running.

> Thanks,
> Walaa.
> Only found 37 responses of 500 needed to start superstep -1. Reporting 
> every 30000 msecs, 296929 more msecs left before giving up.
> 2015-01-20 01:29:49,007 ERROR [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: checkWorkers: Did not 
> receive enough processes in time (only 37 of 500 required) after 
> waiting 600000msecs).  This occurs if you do not have enough map tasks 
> available simultaneously on your Hadoop instance to fulfill the number 
> of requested workers.
> 2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: failJob: Killing job 
> job_1421703431598_0006
> 2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: failJob: exception 
> java.lang.IllegalStateException: Not enough healthy workers to create 
> input splits

View raw message