flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Reducing parallelism leads to NoResourceAvailableException
Date Thu, 28 Apr 2016 15:31:52 GMT
Hi Ufuk,

> On Apr 28, 2016, at 1:32am, Ufuk Celebi <uce@apache.org> wrote:
> 
> Hey Ken!
> 
> That should not happen. Can you check the web interface for two things:
> 
> - How many available slots are advertized on the landing page
> (localhost:8081) when you submit your job?

I’m running this on YARN, so I don’t believe the web UI shows up until the Flink AppManager
has been started, which means I don’t know the advertised number of available slots before
the job is running.

> - Can you check the actual parallelism of the submitted job (it should
> appear as a FAILED job in the web frontend). Is it really 15?

Same as above, the Flink web UI is gone once the job has failed.

Any suggestions for how to check the actual parallelism in this type of transient YARN environment?

Thanks,

— Ken


> On Thu, Apr 28, 2016 at 12:52 AM, Ken Krugler
> <kkrugler_lists@transpac.com> wrote:
>> Hi all,
>> 
>> In trying out different settings for performance, I run into a job failure
>> case that puzzles me.
>> 
>> I’d done a run with a parallelism of 20 (-p 20 via CLI), and the job ran
>> successfully, on a cluster with 40 slots.
>> 
>> I then tried with -p 15, and it failed with:
>> 
>> NoResourceAvailableException: Not enough free slots available to run the
>> job. You can decrease the operator parallelism…
>> 
>> But the change was to reduce parallelism - why would that now cause this
>> problem?
>> 
>> Thanks,
>> 
>> — Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr




Mime
View raw message