From user-return-26768-archive-asf-public=cust-asf.ponee.io@flink.apache.org Thu Mar 28 11:33:50 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 9BC79180629 for ; Thu, 28 Mar 2019 12:33:49 +0100 (CET) Received: (qmail 2226 invoked by uid 500); 28 Mar 2019 11:33:48 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 2216 invoked by uid 99); 28 Mar 2019 11:33:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2019 11:33:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BFEC0180D67 for ; Thu, 28 Mar 2019 11:33:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.535 X-Spam-Level: ** X-Spam-Status: No, score=2.535 tagged_above=-999 required=6.31 tests=[FREEMAIL_ENVFROM_END_DIGIT=0.25, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ZChWFUFG49Wi for ; Thu, 28 Mar 2019 11:33:44 +0000 (UTC) Received: from n4.nabble.com (n4.nabble.com [199.38.86.66]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 187C26252D for ; Thu, 28 Mar 2019 11:23:28 +0000 (UTC) Received: from n4.nabble.com (localhost [127.0.0.1]) by n4.nabble.com (Postfix) with ESMTP id 3783B69BE3C0 for ; Thu, 28 Mar 2019 06:23:22 -0500 (CDT) Date: Thu, 28 Mar 2019 06:23:22 -0500 (CDT) From: "yinhua.dai" To: user@flink.apache.org Message-ID: <1553772202225-0.post@n4.nabble.com> Subject: RemoteTransportException: Connection unexpectedly closed by remote task manager MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, I write a single flink job with flink SQL with version 1.6.1 I have one table source which read data from a database, and one table sink to output as avro format file. The table source has parallelism of 19, and table sink only has parallelism of 1. But there is always RemoteTransportException when the job is nearly done(All data source has been finished, and the data sink has been running for a while). The detail error as below: 2019-03-28 07:53:49,086 ERROR org.apache.flink.runtime.operators.DataSinkTask - Error in user code: Connection unexpectedly closed by remote task manager 'ip-10-97-34-40.tr-fr-nonprod.aws-int.thomsonreuters.com/10.97.34.40:46625'. This might indicate that the remote task manager was lost.: DataSink (com.tr.apt.sqlengine.tables.s3.AvroFileTableSink$AvroOutputFormat@42d174ad) (1/1) org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'ip-10-97-34-40.tr-fr-nonprod.aws-int.thomsonreuters.com/10.97.34.40:46625'. This might indicate that the remote task manager was lost. at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:143) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) at org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:377) at org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:822) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) at java.lang.Thread.run(Thread.java:748) 2019-03-28 07:53:49,440 INFO com.tr.apt.sqlengine.tables.s3.AbstractFileOutputFormat - FileTableSink sinked all data to : file:///tmp/shareamount.avro 2019-03-28 07:53:49,441 INFO org.apache.flink.runtime.taskmanager.Task - DataSink (com.tr.apt.sqlengine.tables.s3.AvroFileTableSink$AvroOutputFormat@42d174ad) (1/1) (31fd3e6fdbb1576e7288e202fff69b07) switched from RUNNING to FAILED. Is the error means that the data sink failed to read all of data from some data source instance before the source end itself? When I check the log of task manager (10.97.34.40:46625), it's all ok, it shows it successfully finished its job and receive SIGNAL 15 and then terminate itself. So how should I find out the root cause of the error? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/