Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 554B6200B51 for ; Mon, 1 Aug 2016 19:02:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5272F160A6C; Mon, 1 Aug 2016 17:02:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9EAB1160AA7 for ; Mon, 1 Aug 2016 19:02:21 +0200 (CEST) Received: (qmail 17948 invoked by uid 500); 1 Aug 2016 17:02:20 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 17818 invoked by uid 99); 1 Aug 2016 17:02:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2016 17:02:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 95AFA2C0D61 for ; Mon, 1 Aug 2016 17:02:20 +0000 (UTC) Date: Mon, 1 Aug 2016 17:02:20 +0000 (UTC) From: "Ufuk Celebi (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Closed] (FLINK-4299) Show loss of job manager in Client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 01 Aug 2016 17:02:22 -0000 [ https://issues.apache.org/jira/browse/FLINK-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-4299. ------------------------------ Resolution: Fixed Fix Version/s: 1.2.0 Fixed in 4d988a9 (release-1.1), 7ea9c01 (master). > Show loss of job manager in Client > ---------------------------------- > > Key: FLINK-4299 > URL: https://issues.apache.org/jira/browse/FLINK-4299 > Project: Flink > Issue Type: Improvement > Components: Client > Reporter: Ufuk Celebi > Assignee: Maximilian Michels > Fix For: 1.1.0, 1.2.0 > > > If the client looses the connection to a job manager and the job recovers from this, the client will only print the job status as {{RUNNING}} again. It is hard to actually notice that something went wrong and a job manager was lost. > {code} > ... > 08/01/2016 14:35:43 Flat Map -> Sink: Unnamed(8/8) switched to RUNNING > 08/01/2016 14:35:43 Source: Custom Source(6/8) switched to RUNNING > <------ EVERYTHING'S RUNNING ------> > 08/01/2016 14:40:40 Job execution switched to status RUNNING <--- JOB MANAGER FAIL OVER > 08/01/2016 14:40:40 Source: Custom Source(1/8) switched to SCHEDULED > 08/01/2016 14:40:40 Source: Custom Source(1/8) switched to DEPLOYING > 08/01/2016 14:40:40 Source: Custom Source(2/8) switched to SCHEDULED > ... > {code} > After {{14:35:43}} everything is running and the client does not print any execution state updates. When the job manager fails, the job will be recovered and enter the running state again eventually (at 14:40:40), but the user might never notice this. > I would like to improve on this by printing some messages about the state of the job manager connection. For example, between {{14:35:43}} and {{14:40:40}} it might say that the job manager connection was lost, a new one established, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)