Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6A216200BC2 for ; Wed, 12 Oct 2016 19:50:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 68A2F160AD4; Wed, 12 Oct 2016 17:50:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BA06B160AEE for ; Wed, 12 Oct 2016 19:50:21 +0200 (CEST) Received: (qmail 47586 invoked by uid 500); 12 Oct 2016 17:50:21 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 47550 invoked by uid 99); 12 Oct 2016 17:50:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2016 17:50:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id AD9222C4C77 for ; Wed, 12 Oct 2016 17:50:20 +0000 (UTC) Date: Wed, 12 Oct 2016 17:50:20 +0000 (UTC) From: "Stephan Ewen (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Closed] (FLINK-3594) StreamTask may fail when checkpoint is concurrent to regular termination MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 12 Oct 2016 17:50:22 -0000 [ https://issues.apache.org/jira/browse/FLINK-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-3594. ------------------------------- > StreamTask may fail when checkpoint is concurrent to regular termination > ------------------------------------------------------------------------ > > Key: FLINK-3594 > URL: https://issues.apache.org/jira/browse/FLINK-3594 > Project: Flink > Issue Type: Bug > Reporter: Chesnay Schepler > Assignee: Stephan Ewen > Priority: Critical > Labels: test-stability > Fix For: 1.1.0 > > > Some tests in the KafkaConsumerTestBase rely on throwing a SuccessException to stop the streaming job if the test condition is fulfilled. > The job then fails, and it is checked whether the cause was a SuccessException. if so, the test is marked as a success, otherwise as a failure. > However, should this exception be thrown while a checkpoint is being triggered, the exception that stop the job is not the SuccessException, but a CancelTaskException. > This should affect every test that uses the SuccessException. > observed here: https://travis-ci.org/apache/flink/jobs/114523189 > The problem is that the exception causes the StreamTask to enter the finally block inside invoke(), which sets isRunning to false. Within triggerCheckpoint() isRunning is then checked for being false, and if so a CancelTaskException is thrown. > This seems like an issue of the runtime; i observed other tests failing, without giving a good cause since the CancelTaskException masks it. > I was wondering whether triggerCheckpoint() could return false instead of throwing an exception, and simply assume that an exception will be thrown within invoke(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)