flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available
Date Wed, 04 Mar 2015 13:51:10 GMT
I agree with Henry.
We should include the name of the required configuration parameter into the
exception.
Users often run into this issue.

I've filed a JIRA to track the fix:
https://issues.apache.org/jira/browse/FLINK-1646


On Thu, Feb 19, 2015 at 6:18 PM, Henry Saputra <henry.saputra@gmail.com>
wrote:

> Would it be helpful to add additional message in the error message in
> NetworkBufferPool#createBufferPool to check the
> taskmanager.network.numberOfBuffers property?
>
>
> - Henry
>
> On Wed, Feb 18, 2015 at 4:32 PM, Yiannis Gkoufas <johngouf85@gmail.com>
> wrote:
> > Perfect! It worked! Thanks a lot for the help!
> >
> > On 18 February 2015 at 22:13, Fabian Hueske <fhueske@gmail.com> wrote:
> >>
> >> 2048 is the default. So you didn't actually increase the number of
> buffers
> >> ;-)
> >>
> >> Try 4096 or so.
> >>
> >> 2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <johngouf85@gmail.com>:
> >>>
> >>> Hi!
> >>>
> >>> thank you for your replies!
> >>> I increased the number of network buffers:
> >>>
> >>> taskmanager.network.numberOfBuffers: 2048
> >>>
> >>> but I am still getting the same error:
> >>>
> >>> Insufficient number of network buffers: required 120, but only 2 of
> 2048
> >>> available.
> >>>
> >>> Thanks a lot!
> >>>
> >>>
> >>> On 18 February 2015 at 20:27, Fabian Hueske <fhueske@gmail.com> wrote:
> >>>>
> >>>> Hi Yiannis,
> >>>>
> >>>> if you scale Flink to larger setups you need to adapt the number of
> >>>> network buffers.
> >>>> The background section of the configuration reference explains the
> >>>> details on that [1].
> >>>>
> >>>> Let us know, if that helped to solve the problem.
> >>>>
> >>>> Best, Fabian
> >>>>
> >>>> [1] http://flink.apache.org/docs/0.8/config.html#background
> >>>>
> >>>> 2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <johngouf85@gmail.com>:
> >>>>>
> >>>>> Hi there,
> >>>>>
> >>>>> I have a cluster of 10 nodes with 12 CPUs each.
> >>>>> This is my configuration:
> >>>>>
> >>>>> jobmanager.rpc.port: 6123
> >>>>>
> >>>>> jobmanager.heap.mb: 4024
> >>>>>
> >>>>> taskmanager.heap.mb: 8096
> >>>>>
> >>>>> taskmanager.numberOfTaskSlots: 12
> >>>>>
> >>>>> parallelization.degree.default: 120
> >>>>>
> >>>>> I have been getting the following error:
> >>>>>
> >>>>> java.lang.Exception: Failed to deploy the task Reduce (SUM(1))
> (65/120)
> >>>>> - execution #0 to slot SimpleSlot (1)(0) -
> efc370a0b2a9a63f2e7b960cfe4e4c27
> >>>>> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of
> network
> >>>>> buffers: required 120, but only 2 of 2048 available.
> >>>>> at
> >>>>>
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
> >>>>> at
> >>>>> org.apache.flink.runtime.taskmanager.TaskManager.org
> $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
> >>>>> at
> >>>>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>>>> at
> >>>>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>>>> at
> >>>>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
> >>>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
> >>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> >>>>> at
> >>>>>
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
> >>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> >>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> >>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> >>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> >>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> >>>>> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>>>>
> >>>>> at
> >>>>>
> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
> >>>>> at akka.dispatch.OnComplete.internal(Future.scala:247)
> >>>>> at akka.dispatch.OnComplete.internal(Future.scala:244)
> >>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
> >>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
> >>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >>>>> at
> >>>>>
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> >>>>> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>>>> at
> >>>>>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>>>>
> >>>>>
> >>>>> I failed to get any info online on how to solve it.
> >>>>> Any help would be welcome.
> >>>>>
> >>>>> Thank you!
> >>>>
> >>>>
> >>>
> >>
> >
>

Mime
View raw message