Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DCDE218E1A for ; Thu, 25 Feb 2016 18:47:52 +0000 (UTC) Received: (qmail 92352 invoked by uid 500); 25 Feb 2016 18:47:52 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 92285 invoked by uid 500); 25 Feb 2016 18:47:52 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 92274 invoked by uid 99); 25 Feb 2016 18:47:52 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Feb 2016 18:47:52 +0000 Received: from mail-ob0-f181.google.com (mail-ob0-f181.google.com [209.85.214.181]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 675F31A0181 for ; Thu, 25 Feb 2016 18:47:52 +0000 (UTC) Received: by mail-ob0-f181.google.com with SMTP id dm2so56464575obb.2 for ; Thu, 25 Feb 2016 10:47:52 -0800 (PST) X-Gm-Message-State: AG10YOQjOqywdtnfKnfzDS9SvkbBT2ueTLT+KUtc97BR4uD/CoT1Usx9+1Fw2mL/uq3gRlGNsypqPMVdu9m3sTCT X-Received: by 10.60.60.3 with SMTP id d3mr37534689oer.24.1456426071629; Thu, 25 Feb 2016 10:47:51 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.22.174 with HTTP; Thu, 25 Feb 2016 10:47:12 -0800 (PST) In-Reply-To: References: <44FA6952-6A93-4D45-9512-5926C2B0B131@gmail.com> <56CEF1D4.2000601@apache.org> From: Ufuk Celebi Date: Thu, 25 Feb 2016 19:47:12 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [VOTE] Release Apache Flink 1.0.0 (RC1) To: dev@flink.apache.org Content-Type: text/plain; charset=UTF-8 On Thu, Feb 25, 2016 at 5:23 PM, Vasiliki Kalavri wrote: > - HA: tested on a 6-node cluster with 2 masters. > Issues: > 1. After new leader election, the job history is cleaned up (at least in > the WebUI). Is this on purpose? Yes, the job history is part of the job manager. > 2. After cluster restart, the jobmanager remembers and tries to re-submit > previously failed resubmissions. > This is one is a bit tricky: > I had a batch job running and killed the master. After the new master took > over, job resubmission failed because the HDFS output directory already > existed. After re-starting the whole cluster and removing the HDFS > directory, the new jobmanager re-submitted the previously failed batch job. I think for this you have to set the write mode to overwrite at the moment. > 3. Upon starting the cluster I get the following warning message "[WARNING] > 1 instance(s) of jobmanager are already running", when jps shows no > existing jobmanager process. This is part of the bash script. It currently checks a PID file to determine the running processes, but it does not actually check whether the PIDs are valid or not. I think it's a good idea to actually check this. Let me open an issue for this...