Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 28A862004C8 for ; Mon, 9 May 2016 10:38:03 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2741016098A; Mon, 9 May 2016 08:38:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 70F461601D4 for ; Mon, 9 May 2016 10:38:02 +0200 (CEST) Received: (qmail 5091 invoked by uid 500); 9 May 2016 08:38:01 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 5082 invoked by uid 99); 9 May 2016 08:38:01 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 May 2016 08:38:01 +0000 Received: from mail-ob0-f179.google.com (mail-ob0-f179.google.com [209.85.214.179]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 3835C1A012C for ; Mon, 9 May 2016 08:38:01 +0000 (UTC) Received: by mail-ob0-f179.google.com with SMTP id x1so79488557obt.0 for ; Mon, 09 May 2016 01:38:01 -0700 (PDT) X-Gm-Message-State: AOPr4FU5AqpW/pximWIDtcDrXjr+dvhxSA20fuRso+UXAhvWZjJYbV0H99hIBWiQrelpf9OWeJPMelS2ii8G7Pgs X-Received: by 10.182.23.81 with SMTP id k17mr14557633obf.68.1462783080470; Mon, 09 May 2016 01:38:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.44.239 with HTTP; Mon, 9 May 2016 01:37:21 -0700 (PDT) In-Reply-To: References: From: Ufuk Celebi Date: Mon, 9 May 2016 10:37:21 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: How to choose the 'parallelism.default' value To: user@flink.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Mon, 09 May 2016 08:38:03 -0000 Hey Punit, you need to give the task managers more network buffers as Robert suggested. Using the formula from the docs, can you please use 147456 (96^2*4*4) for the number of network buffers. Each buffer is 32 KB, meaning that you give 4,5 GB of memory to the network stack. You might have to adjust the heap memory (taskmanager.heap.mb) you give to the task managers accordingly. Does this solve it? =E2=80=93 Ufuk On Sat, May 7, 2016 at 10:50 AM, Punit Naik wrote: > I am afraid not. > > On 07-May-2016 1:24 PM, "Aljoscha Krettek" wrote: >> >> Could it be that the TaskManagers are configured with not-enough memory? >> >> On Thu, 5 May 2016 at 13:35 Robert Metzger wrote: >>> >>> The default value of taskmanager.network.numberOfBuffers is 2048. I wou= ld >>> recommend to use a multiple of that value, for example 16384 (given tha= t you >>> have enough memory per TaskManager) >>> >>> I recommend checking out these slides I created a while ago. They expla= in >>> what the network buffers are needed for: >>> http://www.slideshare.net/robertmetzger1/apache-flink-hands-on#37 >>> >>> >>> On Thu, May 5, 2016 at 1:30 PM, Punit Naik >>> wrote: >>>> >>>> Yes I followed it and changed it to 298 but again it said the same >>>> thing. The only change was that it now said "required 298, but only 20= 0 >>>> available". >>>> >>>> Why did it say that? >>>> >>>> On Thu, May 5, 2016 at 4:50 PM, Robert Metzger >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I think you've chosen a good initial value for the parallelism. >>>>> The higher the parallelism, the more network buffers are needed. I >>>>> would follow the recommendation from the exception and increase the n= umber >>>>> of network buffers. >>>>> >>>>> On Thu, May 5, 2016 at 11:23 AM, Punit Naik >>>>> wrote: >>>>>> >>>>>> Hello >>>>>> >>>>>> I was running a program with 'parallelism.default' of 384 as I read = in >>>>>> the documentation on Flink's official page that 'parallelism.default= ' is >>>>>> "the total number of CPUs in the cluster". I have four machines with= 96 >>>>>> cores on each of them. So 96*4=3D384. But the program thew an error = saying: >>>>>> >>>>>> Caused by: java.io.IOException: Insufficient number of network >>>>>> buffers: required 384, but only 298 available. The total number of n= etwork >>>>>> buffers is currently set to 2048. You can increase this number by se= tting >>>>>> the configuration key 'taskmanager.network.numberOfBuffers'. >>>>>> >>>>>> What does this mean? And how to choose a proper value for parallelis= m? >>>>>> >>>>>> -- >>>>>> Thank You >>>>>> >>>>>> Regards >>>>>> >>>>>> Punit Naik >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Thank You >>>> >>>> Regards >>>> >>>> Punit Naik