flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10818) RestartStrategies.fixedDelayRestart Occur NoResourceAvailableException: Not enough free slots available to run the job.
Date Thu, 08 Nov 2018 09:35:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679502#comment-16679502
] 

Till Rohrmann commented on FLINK-10818:
---------------------------------------

Could you check whether your Yarn cluster had actually the required resources? If you have
other jobs running in your cluster, then it could happen that they take the required resources.
Moreover, you could check whether the problem also occurs with Flink {{1.6.2}} and the new
mode (not legacy).

> RestartStrategies.fixedDelayRestart Occur  NoResourceAvailableException: Not enough free
slots available to run the job.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-10818
>                 URL: https://issues.apache.org/jira/browse/FLINK-10818
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.6.2
>         Environment: JDK 1.8
> Flink 1.6.0 
> Hadoop 2.7.3
>            Reporter: ambition
>            Priority: Major
>
>  Our Online Flink on Yarn environment operation  job,code set restart tactic like 
> {code:java}
> exeEnv.setRestartStrategy(RestartStrategies.fixedDelayRestart(5,1000l));
> {code}
> But job running some days, Occur Exception is :
> {code:java}
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough
free slots available to run the job. You can decrease the operator parallelism or increase
the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #5
(Source: KafkaJsonTableSource -> Map -> where: (AND(OR(=(app_key, _UTF-16LE'C4FAF9CE1569F541'),
=(app_key, _UTF-16LE'F5C7F68C7117630B'), =(app_key, _UTF-16LE'57C6FF4B5A064D29')), OR(=(LOWER(TRIM(FLAG(BOTH),
_UTF-16LE' ', os_type)), _UTF-16LE'ios'), =(LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)),
_UTF-16LE'android')), IS NOT NULL(server_id))), select: (MT_Date_Format_Mode(receive_time,
_UTF-16LE'yyyyMMddHHmm', 10) AS date_p, LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)) AS
os_type, MT_Date_Format_Mode(receive_time, _UTF-16LE'HHmm', 10) AS date_mm, server_id) (1/6))
@ (unassigned) - [SCHEDULED] > with groupID < cbc357ccb763df2852fee8c4fc7d55f2 >
in sharing group < 690dbad267a8ff37c8cb5e9dbedd0a6d >. Resources available to scheduler:
Number of instances=6, total number of slots=6, available slots=0
>    at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:281)
>    at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:155)
>    at org.apache.flink.runtime.executiongraph.Execution.lambda$allocateAndAssignSlotForExecution$2(Execution.java:491)
>    at org.apache.flink.runtime.executiongraph.Execution$$Lambda$44/1664178385.apply(Unknown
Source)
>    at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
>    at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2116)
>    at org.apache.flink.runtime.executiongraph.Execution.allocateAndAssignSlotForExecution(Execution.java:489)
>    at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.allocateResourcesForAll(ExecutionJobVertex.java:521)
>    at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleEager(ExecutionGraph.java:945)
>    at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:875)
>    at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1262)
>    at org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestartCallback.triggerFullRecovery(ExecutionGraphRestartCallback.java:59)
>    at org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.run(FixedDelayRestartStrategy.java:68)
>    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>    at java.lang.Thread.run(Thread.java:745)
> {code}
>  
> this Exception happened when the job started. issue links to 
> https://issues.apache.org/jira/browse/FLINK-4486
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message