giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walaa Eldin Moustafa <>
Subject Re: Number of concurrent workers
Date Tue, 27 Jan 2015 19:00:43 GMT
Thanks! Are not there any options to relax this restriction? My end goal is
to have small input splits for each worker so that workers can finish
processing input splits without throwing out of memory exceptions.

On Tue, Jan 27, 2015 at 1:14 AM, Lukas Nalezenec <> wrote:

>  On 23.1.2015 00:40, Walaa Eldin Moustafa wrote:
>  Hi,
>  I am experimenting with a memory-intensive Giraph application on top of a
> large graph (50 million nodes), on a 14 node cluster.
> When setting the number of workers to a large number (500 in this
> example), I get errors for not being able to fulfill the number of
> requested workers (Please see the log excerpt below). To my understanding,
> this contradicts with how Yarn/MR map tasks operate, as if the number of
> map tasks is more than what is currently available in terms of resources,
> only a subset of the maps are started, and new ones are assigned as new
> slots become available. In other words, as many map tasks as possible can
> run concurrently, and new ones are run as resources become available. Is
> not this the case with Giraph workers? I expect it to be the case, since
> workers are basically map tasks, so the same should apply to them. However,
> the log below suggests otherwise, as based on my resources, 37 map tasks
> (workers) could be created, but the application could not proceed without
> creating all the 500 workers. Could you please help explaining what is
> causing this?
> Hi,
> Giraph is not standard M/R job. It needs all Mappers to run in same
> moment. No computation is started before all mappers are running.
> Its hard to tell it does not work. I guess you have already raised
> timetout. Check if there is enough slots in queue where jobs is running,
> give Giraph higher priority.
> Lukas
> Thanks,
> Walaa.
>  Only found 37 responses of 500 needed to start superstep -1.  Reporting
> every 30000 msecs, 296929 more msecs left before giving up.
>  2015-01-20 01:29:49,007 ERROR [org.apache.giraph.master.MasterThread]
> org.apache.giraph.master.BspServiceMaster: checkWorkers: Did not receive
> enough processes in time (only 37 of 500 required) after waiting
> 600000msecs).  This occurs if you do not have enough map tasks available
> simultaneously on your Hadoop instance to fulfill the number of requested
> workers.
>  2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread]
> org.apache.giraph.master.BspServiceMaster: failJob: Killing job
> job_1421703431598_0006
>  2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread]
> org.apache.giraph.master.BspServiceMaster: failJob: exception
> java.lang.IllegalStateException: Not enough healthy workers to create input
> splits

View raw message