Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F0B661944F for ; Thu, 28 Apr 2016 15:32:15 +0000 (UTC) Received: (qmail 76709 invoked by uid 500); 28 Apr 2016 15:32:15 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 76621 invoked by uid 500); 28 Apr 2016 15:32:15 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 76611 invoked by uid 99); 28 Apr 2016 15:32:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2016 15:32:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 2D7FD1A1496 for ; Thu, 28 Apr 2016 15:32:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.651 X-Spam-Level: ** X-Spam-Status: No, score=2.651 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id SEHq2O03EmlK for ; Thu, 28 Apr 2016 15:32:12 +0000 (UTC) Received: from gproxy1-pub.mail.unifiedlayer.com (gproxy1-pub.mail.unifiedlayer.com [69.89.25.95]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 4D7A95F19A for ; Thu, 28 Apr 2016 15:32:10 +0000 (UTC) Received: (qmail 19164 invoked by uid 0); 28 Apr 2016 15:32:02 -0000 Received: from unknown (HELO cmgw4) (10.0.90.85) by gproxy1.mail.unifiedlayer.com with SMTP; 28 Apr 2016 15:32:02 -0000 Received: from box220.bluehost.com ([69.89.27.220]) by cmgw4 with id nrXs1s01M4kwCAl01rXvBU; Thu, 28 Apr 2016 09:32:02 -0600 X-Authority-Analysis: v=2.1 cv=aJ5j99Nm c=1 sm=1 tr=0 a=JhnL/J+lANDRi3NK4GIIcg==:117 a=JhnL/J+lANDRi3NK4GIIcg==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=tfPvd5Tc7C0A:10 a=kxeWoCsFtUYA:10 a=kziv93cY1bsA:10 a=mV9VRH-2AAAA:8 a=EPOiSyPHAAAA:8 a=Y3OxiLBZAAAA:8 a=cHeztu0HZWIzHf1im-cA:9 a=QEXdDO2ut3YA:10 a=0CTZCOJDxewA:10 a=D3VgwMlfEhWKdHf47McA:9 a=e39jVitZjtAv7ceF:21 a=_W_S_7VecoQA:10 Received: from [24.7.149.223] (port=60159 helo=kens-mbp.hsd1.ca.comcast.net) by box220.bluehost.com with esmtpsa (TLSv1:ECDHE-RSA-AES256-SHA:256) (Exim 4.86_2) (envelope-from ) id 1avnuy-0003Mj-Hi for user@flink.apache.org; Thu, 28 Apr 2016 09:31:52 -0600 From: Ken Krugler Content-Type: multipart/alternative; boundary="Apple-Mail=_F3A4364E-246C-4F22-8CE9-DE8D854501B1" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: Reducing parallelism leads to NoResourceAvailableException Date: Thu, 28 Apr 2016 08:31:52 -0700 References: <1585AA4A-D2D4-4E11-B25B-37AC07B4C0BE@transpac.com> To: user@flink.apache.org In-Reply-To: X-Mailer: Apple Mail (2.3112) X-Identified-User: {1610:box220.bluehost.com:bixolabs:scaleunlimited.com} {sentby:smtp auth 24.7.149.223 authed with kkrugler@scaleunlimited.com} --Apple-Mail=_F3A4364E-246C-4F22-8CE9-DE8D854501B1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Ufuk, > On Apr 28, 2016, at 1:32am, Ufuk Celebi wrote: >=20 > Hey Ken! >=20 > That should not happen. Can you check the web interface for two = things: >=20 > - How many available slots are advertized on the landing page > (localhost:8081) when you submit your job? I=E2=80=99m running this on YARN, so I don=E2=80=99t believe the web UI = shows up until the Flink AppManager has been started, which means I = don=E2=80=99t know the advertised number of available slots before the = job is running. > - Can you check the actual parallelism of the submitted job (it should > appear as a FAILED job in the web frontend). Is it really 15? Same as above, the Flink web UI is gone once the job has failed. Any suggestions for how to check the actual parallelism in this type of = transient YARN environment? Thanks, =E2=80=94 Ken > On Thu, Apr 28, 2016 at 12:52 AM, Ken Krugler > wrote: >> Hi all, >>=20 >> In trying out different settings for performance, I run into a job = failure >> case that puzzles me. >>=20 >> I=E2=80=99d done a run with a parallelism of 20 (-p 20 via CLI), and = the job ran >> successfully, on a cluster with 40 slots. >>=20 >> I then tried with -p 15, and it failed with: >>=20 >> NoResourceAvailableException: Not enough free slots available to run = the >> job. You can decrease the operator parallelism=E2=80=A6 >>=20 >> But the change was to reduce parallelism - why would that now cause = this >> problem? >>=20 >> Thanks, >>=20 >> =E2=80=94 Ken -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr --Apple-Mail=_F3A4364E-246C-4F22-8CE9-DE8D854501B1 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi Ufuk,

On Apr 28, 2016, at 1:32am, = Ufuk Celebi <uce@apache.org> wrote:

Hey = Ken!

That should not happen. Can you check = the web interface for two things:

- How = many available slots are advertized on the landing page
(localhost:8081) when you submit your job?

I=E2=80=99m running this on YARN, so I don=E2=80=99t= believe the web UI shows up until the Flink AppManager has been = started, which means I don=E2=80=99t know the advertised number of = available slots before the job is running.

- Can you check the actual parallelism of the submitted job = (it should
appear as a FAILED job in the web frontend). Is = it really 15?

Same as above, the Flink web UI is gone once = the job has failed.

Any suggestions = for how to check the actual parallelism in this type of transient YARN = environment?

Thanks,

=E2=80=94 Ken


On Thu, Apr 28, 2016 at 12:52 = AM, Ken Krugler
<kkrugler_lists@transpac.com> wrote:
Hi all,

In trying out different settings for performance, I run into = a job failure
case that puzzles me.

I=E2=80=99d done a run with a parallelism of 20 (-p 20 via = CLI), and the job ran
successfully, on a cluster with 40 = slots.

I then tried with -p 15, and it = failed with:

NoResourceAvailableException: = Not enough free slots available to run the
job. You can = decrease the operator parallelism=E2=80=A6

But the change was to reduce parallelism - why would that now = cause this
problem?

Thanks,

=E2=80=94 Ken

--------------------------
Ken = Krugler
+1 530-210-6378
custom = big data solutions & training
Hadoop, = Cascading, Cassandra & = Solr
<= /span>



= --Apple-Mail=_F3A4364E-246C-4F22-8CE9-DE8D854501B1--