From user-return-22081-archive-asf-public=cust-asf.ponee.io@flink.apache.org Mon Aug 13 16:25:57 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6FB79180629 for ; Mon, 13 Aug 2018 16:25:56 +0200 (CEST) Received: (qmail 36155 invoked by uid 500); 13 Aug 2018 14:25:55 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 36145 invoked by uid 99); 13 Aug 2018 14:25:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2018 14:25:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B378A1801AF for ; Mon, 13 Aug 2018 14:25:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.993 X-Spam-Level: * X-Spam-Status: No, score=1.993 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FROM_EXCESS_BASE64=0.105, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id hv7kmnzKX_79 for ; Mon, 13 Aug 2018 14:25:52 +0000 (UTC) Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 61F135F3B4 for ; Mon, 13 Aug 2018 14:25:52 +0000 (UTC) Received: by mail-wr1-f67.google.com with SMTP id h9-v6so14469940wro.3 for ; Mon, 13 Aug 2018 07:25:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=WHnyIVy/bVvmuxSrya54aHfuxFJMvsOXccYR+4fq3Ks=; b=QdiOoS83HsRxbV9Q0RQr3GCHpC9k9nOb2RJfwZQpCPuzAN6vxfIWxRREUXnDOdYTSC rhwWFOG7bI5mpjLGUuu6Tcrrl9ZKh/ocrJ2QdRiLnDJXw8GaV+V+qeLdvRWmJ+Ohw3v4 m3+hOJN+DHmML3WS9Xqls/cP7Ely4zbsUpctbuOCr/UD+xtA55a4rbQsAe8fOm185bcI e8sC2P4ZDJK/z0fAlaS9IAf978CQvYsKoJ88sep/MVAqO+m+miYRwfYPYossUSsgfujm gFEvq+j0XeNKBhuC22xl/Y7jWsTbSUgffAia+Ehdkmp6YtPdPL8m74lHUY75Nqzs8dQX 6kiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=WHnyIVy/bVvmuxSrya54aHfuxFJMvsOXccYR+4fq3Ks=; b=eprMUb8jK4Fud2XrLk3brLn6EbW0ZxvGmrtRq6qSxlZpy4OCTFc6qIRyBL/tsSlGwK TRTUrD6MMeFgkaoC5nmQeH+n00iHoz06eDQsK/v5wrrn4JOlLJ4NawiC1Q/TINYVhsiw o3VDvekDMs8MQXYxgOwuP6qqH3IykJ+ZqEv36bzNUzSNVH3DgtVV2rQD9Ieb9doYA96a NHgZSLUOx+VbjeVgmNrbHteZLntq3AVWzGzX7cNC4CTkX41K+K43YOoyjRh2QRUfCkLa gAieIGpLjYK3VxRLtjAu2+QDQsrMyLDvuimfoMZarZ+9R2s6+l/Xlsa4eOtXcr6Fbw24 1oEA== X-Gm-Message-State: AOUpUlGzp4tIJBCBNwtj6PyLsP/y0XYw+TjOVDto5+Wn573coLh/yUmS MoNyvCFTyACBfjUu2Nv3y/ZcmjQVQqsgBe3FJQUhUQ== X-Google-Smtp-Source: AA+uWPyla+CKzeS3ogvLJuSqX6gWsGnaw+vnpzNcgYDzBnA76KUPBSs1/UFl8pUuHhAJL5Gvj4wBNTHaqr8FQKUezoE= X-Received: by 2002:adf:ecc6:: with SMTP id s6-v6mr10735404wro.160.1534170351221; Mon, 13 Aug 2018 07:25:51 -0700 (PDT) MIME-Version: 1.0 From: =?UTF-8?B?5p2o5Yqb?= Date: Mon, 13 Aug 2018 22:25:40 +0800 Message-ID: Subject: Flink 1.6 ExecutionJobVertex.getTaskInformationOrBlobKey OutOfMemoryError To: user Content-Type: multipart/alternative; boundary="000000000000cf4160057351de38" --000000000000cf4160057351de38 Content-Type: text/plain; charset="UTF-8" I used to runFlink SQL in streaming mode with more than 70 sqls in version 1.4. With so many sqls loaded, akka.framesize has to be set to 200 MB to submit the job. When I am trying to run the job with flink 1.6.0, the HTTP-based job submission works perfectly but an OutOfMemoryError is thrown when tasks are being depolyed. java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:512) at org.apache.flink.util.SerializedValue.(SerializedValue.java:52) at org.apache.flink.runtime.blob.BlobWriter.serializeAndTryOffload(BlobWriter.java:99) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.getTaskInformationOrBlobKey(ExecutionJobVertex.java:393) at org.apache.flink.runtime.executiongraph.ExecutionVertex.createDeploymentDescriptor(ExecutionVertex.java:827) at org.apache.flink.runtime.executiongraph.Execution.deploy(Execution.java:580) at org.apache.flink.runtime.executiongraph.ExecutionGraph.lambda$scheduleEager$2(ExecutionGraph.java:963) at org.apache.flink.runtime.executiongraph.ExecutionGraph$$Lambda$105/800937955.accept(Unknown Source) at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) at org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:541) at org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture$$Lambda$92/1432873073.accept(Unknown Source) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772) at akka.dispatch.OnComplete.internal(Future.scala:259) at akka.dispatch.OnComplete.internal(Future.scala:256) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) This OOM error raises even with a 12GB heap. I have dived into source code, only found that ExecutionJobVertex.getTaskInformationOrBlobKey is serializing a TaskInformation object, which seems not to be a large one. Can anyone help me to fix or work around the problem? --000000000000cf4160057351de38 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I used to runFlink SQL in streaming mode with more th= an 70 sqls in version 1.4. With so many sqls loaded, akka.framesize has to = be set to 200 MB to submit the job.

When I am tryi= ng to run the job with flink 1.6.0, the HTTP-based job submission works per= fectly but an OutOfMemoryError is thrown when tasks are being depolyed.

java.lang.= OutOfMemoryError: Java heap space
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at = java.util.Arrays.copyOf(Arrays.java:3236)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.io.ByteArrayOutputStream.en= sureCapacity(ByteArrayOutputStream.java:93)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:15= 3)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.io.ObjectOutputStream$Bloc= kDataOutputStream.drain(ObjectOutputStream.java:1877)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at java.io.ObjectOutputStream$BlockDataOutputStream.setBl= ockDataMode(ObjectOutputStream.java:1786)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1= 189)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.io.ObjectOutputStream.wr= iteObject(ObjectOutputStream.java:348)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.flink.util.InstantiationUtil.serializeObject(Instantiatio= nUtil.java:512)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.u= til.SerializedValue.<init>(SerializedValue.java:52)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.blob.BlobWriter.serializeA= ndTryOffload(BlobWriter.java:99)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at o= rg.apache.flink.runtime.executiongraph.ExecutionJobVertex.getTaskInformatio= nOrBlobKey(ExecutionJobVertex.java:393)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.flink.runtime.executiongraph.ExecutionVertex.createDeploy= mentDescriptor(ExecutionVertex.java:827)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.flink.runtime.executiongraph.Execution.deploy(Execution.j= ava:580)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.= executiongraph.ExecutionGraph.lambda$scheduleEager$2(ExecutionGraph.java:96= 3)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.execut= iongraph.ExecutionGraph$$Lambda$105/800937955.accept(Unknown Source)
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.CompletableFuture.u= niAccept(CompletableFuture.java:656)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFutu= re.java:632)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.= CompletableFuture.postComplete(CompletableFuture.java:474)
=C2=A0= =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.CompletableFuture.complete(Co= mpletableFuture.java:1962)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apa= che.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleComplet= edFuture(FutureUtils.java:541)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org= .apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture$$Lambda$9= 2/1432873073.accept(Unknown Source)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 a= t java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.= java:760)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.Com= pletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.CompletableFuture.pos= tComplete(CompletableFuture.java:474)
=C2=A0 =C2=A0 =C2=A0 =C2=A0= at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:= 1962)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.con= current.FutureUtils$1.onComplete(FutureUtils.java:772)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at akka.dispatch.OnComplete.internal(Future.scala:259)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.dispatch.OnComplete.internal(Fu= ture.scala:256)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.dispatch.japi= $CallbackBridge.apply(Future.scala:186)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.concurrent.impl.CallbackRunnable.run(P= romise.scala:36)

This OO= M error raises even with a 12GB heap. I have dived into source code, only f= ound that ExecutionJobVertex.getTaskInformationOrBlobKey is serializing a T= askInformation object, which seems not to be a large one. Can anyone help m= e to fix or work around the problem?
--000000000000cf4160057351de38--