Mailing-List: contact dev-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAJZ2dcXOyGSS8rPr5LR+4XAVjuzdJ=QqJVMroTpEc8cBYuxbAA@mail.gmail.com>
References: 
 <CAGr9p8DRtrwiQ60hFUrk8MOD9TNwaYodCW7Pr8aEHGgY8geb5Q@mail.gmail.com>
 <44FA6952-6A93-4D45-9512-5926C2B0B131@gmail.com>
 <CAKADb_P8wqsiCnPi7_kZaU7o+5WEBps3SEvfhv8WLvAjBucccg@mail.gmail.com>
 <CAGr9p8AwjzsHqwS2JEqHPmGfzGyDc_fr1ONwqCAx7UW9jrPGAg@mail.gmail.com>
 <56CEF1D4.2000601@apache.org>
 <CAAdrtT1jZX1iL+mmgA7nTkzYJAdPy1EiHH+k+FQARtQNfrT+zg@mail.gmail.com>
 <CAGr9p8CEJx4eebF5UsR+L9gx4DP78NcqC9P3CgCWXfqwZYNjNA@mail.gmail.com>
 <CAKiyyaGapd+bviC8rKOg++dmtprFxPgq+jB-o=vsjAGth0EbWQ@mail.gmail.com>
 <CANC1h_vWpjZRzThn8uYGT3QnYcwyuD34YpHsA7ZQcEaX45_4QA@mail.gmail.com>
 <CAKADb_MaAiELS3v3VDBa1Ho_vjNuB7M-S4EzHWoZDc=kORdgZg@mail.gmail.com>
 <CANC1h_u2zwNbwcqr+7_4J65HztncZ7MDJCoG2m1nyq62hTBiUQ@mail.gmail.com>
 <CAJZ2dcXOyGSS8rPr5LR+4XAVjuzdJ=QqJVMroTpEc8cBYuxbAA@mail.gmail.com>
From: Ufuk Celebi <uce@apache.org>
Date: Thu, 25 Feb 2016 19:47:12 +0100
Message-ID: 
 <CAKiyyaFinwC8+Evm8PnQA+ofhTBg2x35Ra6Ts_ChGwgsPOEc6w@mail.gmail.com>
Subject: Re: [VOTE] Release Apache Flink 1.0.0 (RC1)
To: dev@flink.apache.org
Content-Type: text/plain; charset=UTF-8

On Thu, Feb 25, 2016 at 5:23 PM, Vasiliki Kalavri
<vasilikikalavri@gmail.com> wrote:
> - HA: tested on a 6-node cluster with 2 masters.
> Issues:
> 1. After new leader election, the job history is cleaned up (at least in
> the WebUI). Is this on purpose?

Yes, the job history is part of the job manager.

> 2. After cluster restart, the jobmanager remembers and tries to re-submit
> previously failed resubmissions.
> This is one is a bit tricky:
> I had a batch job running and killed the master. After the new master took
> over, job resubmission failed because the HDFS output directory already
> existed. After re-starting the whole cluster and removing the HDFS
> directory, the new jobmanager re-submitted the previously failed batch job.

I think for this you have to set the write mode to overwrite at the moment.

> 3. Upon starting the cluster I get the following warning message "[WARNING]
> 1 instance(s) of jobmanager are already running", when jps shows no
> existing jobmanager process.

This is part of the bash script. It currently checks a PID file to
determine the running processes, but it does not actually check
whether the PIDs are valid or not. I think it's a good idea to
actually check this. Let me open an issue for this...