Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F802105C3 for ; Tue, 27 Jan 2015 19:01:11 +0000 (UTC) Received: (qmail 32739 invoked by uid 500); 27 Jan 2015 19:01:10 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 32688 invoked by uid 500); 27 Jan 2015 19:01:10 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 32678 invoked by uid 99); 27 Jan 2015 19:01:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Jan 2015 19:01:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wa.moustafa@gmail.com designates 209.85.213.54 as permitted sender) Received: from [209.85.213.54] (HELO mail-yh0-f54.google.com) (209.85.213.54) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Jan 2015 19:00:45 +0000 Received: by mail-yh0-f54.google.com with SMTP id 29so6854795yhl.13 for ; Tue, 27 Jan 2015 11:00:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vKwzk8FowkcKkMnJr6T1LH+RvJjc6xiz7gfwTT2akUk=; b=fyhvzFCR3Io0C5IsmnirnQieQPrUQqeWOv5BgGo/XWqAJVlqlL3AliTIuWDc5FW3bY FVeEQsIqQooHKWNmEyl+24TKY5vv2/n8dcKgoLA1Yfa0tZ5nv+q9BStbjvykxeKujOD3 iNl9ZW/qjJUcmB93MOfPhLCRpohk+uKk1pL0Lqu1c6DD07jShiJtlBPREuChKAz3Woz0 Yp0VIsdJebdINCxz8hNsXgXBmsfOCuVBLtMQzQ+KK0ufyGxXkHuQt1U3fERWtQT5QxnE WwgF5GI22MAM6t0fiFSvOeNBw7vVhat5ic+A00P+QmwmaK8PR57dpxsiTOfLTZMBt8rL ayKw== MIME-Version: 1.0 X-Received: by 10.170.165.130 with SMTP id h124mr1656870ykd.63.1422385243650; Tue, 27 Jan 2015 11:00:43 -0800 (PST) Received: by 10.170.85.215 with HTTP; Tue, 27 Jan 2015 11:00:43 -0800 (PST) In-Reply-To: <54C756EC.4000101@firma.seznam.cz> References: <54C756EC.4000101@firma.seznam.cz> Date: Tue, 27 Jan 2015 11:00:43 -0800 Message-ID: Subject: Re: Number of concurrent workers From: Walaa Eldin Moustafa To: user@giraph.apache.org Content-Type: multipart/alternative; boundary=001a113a95f42e7eff050da6dfd5 X-Virus-Checked: Checked by ClamAV on apache.org --001a113a95f42e7eff050da6dfd5 Content-Type: text/plain; charset=UTF-8 Thanks! Are not there any options to relax this restriction? My end goal is to have small input splits for each worker so that workers can finish processing input splits without throwing out of memory exceptions. On Tue, Jan 27, 2015 at 1:14 AM, Lukas Nalezenec < lukas.nalezenec@firma.seznam.cz> wrote: > On 23.1.2015 00:40, Walaa Eldin Moustafa wrote: > > Hi, > > I am experimenting with a memory-intensive Giraph application on top of a > large graph (50 million nodes), on a 14 node cluster. > > When setting the number of workers to a large number (500 in this > example), I get errors for not being able to fulfill the number of > requested workers (Please see the log excerpt below). To my understanding, > this contradicts with how Yarn/MR map tasks operate, as if the number of > map tasks is more than what is currently available in terms of resources, > only a subset of the maps are started, and new ones are assigned as new > slots become available. In other words, as many map tasks as possible can > run concurrently, and new ones are run as resources become available. Is > not this the case with Giraph workers? I expect it to be the case, since > workers are basically map tasks, so the same should apply to them. However, > the log below suggests otherwise, as based on my resources, 37 map tasks > (workers) could be created, but the application could not proceed without > creating all the 500 workers. Could you please help explaining what is > causing this? > > > Hi, > Giraph is not standard M/R job. It needs all Mappers to run in same > moment. No computation is started before all mappers are running. > Its hard to tell it does not work. I guess you have already raised > timetout. Check if there is enough slots in queue where jobs is running, > give Giraph higher priority. > Lukas > > > Thanks, > > Walaa. > > > Only found 37 responses of 500 needed to start superstep -1. Reporting > every 30000 msecs, 296929 more msecs left before giving up. > > > 2015-01-20 01:29:49,007 ERROR [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: checkWorkers: Did not receive > enough processes in time (only 37 of 500 required) after waiting > 600000msecs). This occurs if you do not have enough map tasks available > simultaneously on your Hadoop instance to fulfill the number of requested > workers. > > > 2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: failJob: Killing job > job_1421703431598_0006 > > 2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: failJob: exception > java.lang.IllegalStateException: Not enough healthy workers to create input > splits > > > --001a113a95f42e7eff050da6dfd5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks! Are not there any options to relax this restrictio= n? My end goal is to have small input splits for each worker so that worker= s can finish processing input splits without throwing out of memory excepti= ons.

On Tue,= Jan 27, 2015 at 1:14 AM, Lukas Nalezenec <lukas.nalezenec@f= irma.seznam.cz> wrote:
=20 =20 =20
On 23.1.2015 00:40, Walaa Eldin Moustafa wrote:
=20
Hi,

I am experimenting with a memory-intensive Giraph application on top of a large graph (50 million nodes), on a 14 node cluster.

When setting the number of workers to a large number (500 in this example), I get errors for not being able to fulfill the number of requested workers (Please see the log excerpt below). To my understanding, this contradicts with how Yarn/MR map tasks operate, as if the number of map tasks is more than what is currently available in terms of resources, only a subset of the maps are started, and new ones are assigned as new slots become available. In other words, as many map tasks as possible can run concurrently, and new ones are run as resources become available. Is not this the case with Giraph workers? I expect it to be the case, since workers are basically map tasks, so the same should apply to them. However, the log below suggests otherwise, as based on my resources, 37 map tasks (workers) could be created, but the application could not proceed without creating all the 500 workers. Could you please help explaining what is causing this?

Hi,
Giraph is not standard M/R job. It needs all Mappers to run in same moment. No computation is started before all mappers are running.
Its hard to tell it does not work. I guess you have already raised timetout. Check if there is enough slots in queue where jobs is running, give Giraph higher priority.
Lukas



Thanks,

Walaa.


Only found 37 responses of 500 needed to start superstep -1.=C2=A0 Reporting every 30000 msecs, 296929 more msecs left before giving up.


2015-01-20 01:29:49,007 ERROR [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Did not receive enough processes in time (only 37 of 500 required) after waiting 600000msecs).=C2=A0 This occurs if you do not have enough map tasks available simultaneously on your Hadoop instance to fulfill the number of requested workers.


2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: failJob: Killing job job_1421703431598_0006

2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: failJob: exception java.lang.IllegalStateException: Not enough healthy workers to create input splits




--001a113a95f42e7eff050da6dfd5--