giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walaa Eldin Moustafa <wa.moust...@gmail.com>
Subject Re: Number of concurrent workers
Date Tue, 27 Jan 2015 05:42:08 GMT
Just bumping up this thread. Thanks in advance for the help.

Thanks,
Walaa.


On Thu, Jan 22, 2015 at 3:40 PM, Walaa Eldin Moustafa <wa.moustafa@gmail.com
> wrote:

> Hi,
>
> I am experimenting with a memory-intensive Giraph application on top of a
> large graph (50 million nodes), on a 14 node cluster.
>
> When setting the number of workers to a large number (500 in this
> example), I get errors for not being able to fulfill the number of
> requested workers (Please see the log excerpt below). To my understanding,
> this contradicts with how Yarn/MR map tasks operate, as if the number of
> map tasks is more than what is currently available in terms of resources,
> only a subset of the maps are started, and new ones are assigned as new
> slots become available. In other words, as many map tasks as possible can
> run concurrently, and new ones are run as resources become available. Is
> not this the case with Giraph workers? I expect it to be the case, since
> workers are basically map tasks, so the same should apply to them. However,
> the log below suggests otherwise, as based on my resources, 37 map tasks
> (workers) could be created, but the application could not proceed without
> creating all the 500 workers. Could you please help explaining what is
> causing this?
>
> Thanks,
>
> Walaa.
>
>
> Only found 37 responses of 500 needed to start superstep -1.  Reporting
> every 30000 msecs, 296929 more msecs left before giving up.
>
>
> 2015-01-20 01:29:49,007 ERROR [org.apache.giraph.master.MasterThread]
> org.apache.giraph.master.BspServiceMaster: checkWorkers: Did not receive
> enough processes in time (only 37 of 500 required) after waiting
> 600000msecs).  This occurs if you do not have enough map tasks available
> simultaneously on your Hadoop instance to fulfill the number of requested
> workers.
>
>
> 2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread]
> org.apache.giraph.master.BspServiceMaster: failJob: Killing job
> job_1421703431598_0006
>
> 2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread]
> org.apache.giraph.master.BspServiceMaster: failJob: exception
> java.lang.IllegalStateException: Not enough healthy workers to create input
> splits
>
>

Mime
View raw message