Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A269B200C0F for ; Wed, 18 Jan 2017 16:07:32 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A10DC160B34; Wed, 18 Jan 2017 15:07:32 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EC733160B3A for ; Wed, 18 Jan 2017 16:07:31 +0100 (CET) Received: (qmail 15962 invoked by uid 500); 18 Jan 2017 15:07:31 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 15510 invoked by uid 99); 18 Jan 2017 15:07:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jan 2017 15:07:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 820C9185E8F for ; Wed, 18 Jan 2017 15:07:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id rGDhSgsztnUb for ; Wed, 18 Jan 2017 15:07:29 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id DF5075FE3F for ; Wed, 18 Jan 2017 15:07:28 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id DA001E865F for ; Wed, 18 Jan 2017 15:07:27 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 776A12528A for ; Wed, 18 Jan 2017 15:07:26 +0000 (UTC) Date: Wed, 18 Jan 2017 15:07:26 +0000 (UTC) From: "Robert Metzger (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (FLINK-5553) Job fails during deployment with IllegalStateException from subpartition request MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 18 Jan 2017 15:07:32 -0000 Robert Metzger created FLINK-5553: ------------------------------------- Summary: Job fails during deployment with IllegalStateException from subpartition request Key: FLINK-5553 URL: https://issues.apache.org/jira/browse/FLINK-5553 Project: Flink Issue Type: Bug Components: Network Affects Versions: 1.3.0 Reporter: Robert Metzger While running a test job with Flink 1.3-SNAPSHOT (6fb6967b9f9a31f034bd09fcf76aaf147bc8e9a0) the job failed with this exception: {code} 2017-01-18 14:56:27,043 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (9/10) (befc06d0e792c2ce39dde74b365dd3cf) switched from DEPLOYING to RUNNING. 2017-01-18 14:56:27,059 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map (9/10) (e94a01ec283e5dce7f79b02cf51654c4) switched from DEPLOYING to RUNNING. 2017-01-18 14:56:27,817 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map (10/10) (cbb61c9a2f72c282877eb383e111f7cd) switched from RUNNING to FAILED. java.lang.IllegalStateException: There has been an error in the channel. at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77) at org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104) at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441) at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666) at java.lang.Thread.run(Thread.java:745) 2017-01-18 14:56:27,819 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Misbehaved Job (b1d985d11984df57400fdff2bb656c59) switched from state RUNNING to FAILING. java.lang.IllegalStateException: There has been an error in the channel. at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77) at org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104) at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441) at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666) at java.lang.Thread.run(Thread.java:745) {code} This is the first exception that is reported to the jobmanager. I think this is related to missing network buffers. You see that from the next deployment after the restart, where the deployment fails with the insufficient number of buffers exception. I'll add logs to the JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)