Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B3F721085C for ; Wed, 18 Feb 2015 20:21:19 +0000 (UTC) Received: (qmail 85373 invoked by uid 500); 18 Feb 2015 20:21:13 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 85309 invoked by uid 500); 18 Feb 2015 20:21:13 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 85299 invoked by uid 99); 18 Feb 2015 20:21:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2015 20:21:13 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of johngouf85@gmail.com designates 209.85.213.52 as permitted sender) Received: from [209.85.213.52] (HELO mail-yh0-f52.google.com) (209.85.213.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2015 20:21:08 +0000 Received: by yhab6 with SMTP id b6so2418956yha.10 for ; Wed, 18 Feb 2015 12:19:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=nzIq7nHjhYUIvoKsnMmcWuJNvcvwWzVVmn8xE1dnTOs=; b=VlQjWRfSLSVHGtbJs+O0dvXfSbu62PlxWiBLekLZb6NE6DQ1cyb+UVxvgW6cNklKWP 0Nvw8vMnkURGq9XqFxVjXd4pHNVkGQDNr9wYoQPlZ3TNQQPfbv5MhmFKl0JSNFgziBSw Td6DYo71dZT4NlLqMbwLqV3+wcE+HHRfxjyAENfHhiacvJfaxUwXG8ynsHahbJg3DlBo DeELUCrdU6sJf8vulIqkLAxZlaB8YIzOMTP28jTO1Cifd12L0qAKh+X8QALS0SgsZW2V co1HKHA4k+Uzg+Ws4LQmXD5A5ER//dOPyf/uOqQnca/gWt6oukX6YLaNJoayDXqQzpy9 0vLg== X-Received: by 10.52.135.80 with SMTP id pq16mr595109vdb.3.1424290757955; Wed, 18 Feb 2015 12:19:17 -0800 (PST) MIME-Version: 1.0 Received: by 10.52.50.211 with HTTP; Wed, 18 Feb 2015 12:18:57 -0800 (PST) From: Yiannis Gkoufas Date: Wed, 18 Feb 2015 20:18:57 +0000 Message-ID: Subject: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available To: user@flink.apache.org Content-Type: multipart/alternative; boundary=bcaec51a8a6caf5957050f6288d5 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec51a8a6caf5957050f6288d5 Content-Type: text/plain; charset=UTF-8 Hi there, I have a cluster of 10 nodes with 12 CPUs each. This is my configuration: jobmanager.rpc.port: 6123 jobmanager.heap.mb: 4024 taskmanager.heap.mb: 8096 taskmanager.numberOfTaskSlots: 12 parallelization.degree.default: 120 I have been getting the following error: java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network buffers: required 120, but only 2 of 2048 available. at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155) at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163) at org.apache.flink.runtime.taskmanager.TaskManager.org $apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) at org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344) at akka.dispatch.OnComplete.internal(Future.scala:247) at akka.dispatch.OnComplete.internal(Future.scala:244) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) I failed to get any info online on how to solve it. Any help would be welcome. Thank you! --bcaec51a8a6caf5957050f6288d5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi there,

I have a cluster of 10 nodes = with 12 CPUs each.
This is my configuration:

=
jobmanager.rpc.port: 6123

jobmanager.hea= p.mb: 4024

taskmanager.heap.mb: 8096
taskmanager.numberOfTaskSlots: 12

para= llelization.degree.default: 120

I have been = getting the following error:

java.lang.Except= ion: Failed to deploy the task Reduce (SUM(1)) (65/120) - execution #0 to s= lot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 - ALLOCATED/ALIVE:= java.io.IOException: Insufficient number of network buffers: required 120,= but only 2 of 2048 available.
at org.apache.flink.runtime.io.network.buffer.NetworkBuffe= rPool.createBufferPool(NetworkBufferPool.java:155)
at org.apache.flink.runtime.io.network= .NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
at org.apache.flink.runtime.ta= skmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$= $submitTask(TaskManager.scala:426)
at org.apache.flink.runtime.taskmanager.TaskManager$$a= nonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
at scala.runtime.Abstr= actPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)<= /div>
at scala.runti= me.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)<= /div>
at scala.runti= me.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)<= /div>
at org.apache.= flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
at org.apache.fli= nk.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
<= div> at scala.PartialFunc= tion$class.applyOrElse(PartialFunction.scala:118)
at org.apache.flink.runtime.ActorLogMes= sages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
at akka.actor.Actor$class.aroundRec= eive(Actor.scala:465)
= at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(T= askManager.scala:89)
= at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCel= l.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)<= /div>
at akka.dispat= ch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
<= div> at scala.concurrent.= forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJo= inPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPoo= l.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run= (ForkJoinWorkerThread.java:107)

at org.apache.flink.runtime.executiongraph= .Execution$2.onComplete(Execution.java:344)
at akka.dispatch.OnComplete.internal(Future.s= cala:247)
at a= kka.dispatch.OnComplete.internal(Future.scala:244)
at akka.dispatch.japi$CallbackBridge.a= pply(Future.scala:174)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
=
at scala.concurrent= .impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImp= l$$anon$3.exec(ExecutionContextImpl.scala:107)
at scala.concurrent.forkjoin.ForkJoinTask.= doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask= (ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool= .java:1979)
at= scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.ja= va:107)


I failed to get any i= nfo online on how to solve it.
Any help would be welcome.

Thank you!
--bcaec51a8a6caf5957050f6288d5--