Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7FA7B200C06 for ; Fri, 27 Jan 2017 13:26:44 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7CBCC160B5B; Fri, 27 Jan 2017 12:26:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9938B160B47 for ; Fri, 27 Jan 2017 13:26:43 +0100 (CET) Received: (qmail 42726 invoked by uid 500); 27 Jan 2017 12:26:42 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 42714 invoked by uid 99); 27 Jan 2017 12:26:42 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jan 2017 12:26:42 +0000 Received: from mail-io0-f170.google.com (mail-io0-f170.google.com [209.85.223.170]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 13C2A1A018B for ; Fri, 27 Jan 2017 12:26:41 +0000 (UTC) Received: by mail-io0-f170.google.com with SMTP id j18so59225477ioe.2 for ; Fri, 27 Jan 2017 04:26:41 -0800 (PST) X-Gm-Message-State: AIkVDXIpSxcum23ZVhLW/ounhgk/rIztuye6+LDA0oYF00wHwzwHFDer8+yiz/jhcwMZuIqOkAe8HXKWap5XFg== X-Received: by 10.107.197.69 with SMTP id v66mr4217640iof.119.1485520001356; Fri, 27 Jan 2017 04:26:41 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Aljoscha Krettek Date: Fri, 27 Jan 2017 12:26:30 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [VOTE] Release Apache Flink 1.2.0 (RC2) To: dev@flink.apache.org, =?UTF-8?Q?Gyula_F=C3=B3ra?= Content-Type: multipart/alternative; boundary=94eb2c189b96fce01e05471293b3 archived-at: Fri, 27 Jan 2017 12:26:44 -0000 --94eb2c189b96fce01e05471293b3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I think this issue that Ufuk opened is also a blocker: https://issues.apache.org/jira/browse/FLINK-5670 As I comment in the Issue, at least one bigger user of Flink has run into this problem on their cluster. On Fri, 27 Jan 2017 at 10:50 Ufuk Celebi wrote: > Thanks Gyula! > > The current state of things is: > - Stefan is working on a fix for > https://issues.apache.org/jira/browse/FLINK-5663. > - Till is working on https://issues.apache.org/jira/browse/FLINK-5667. > > As far as I can tell, these will be fixed today and we are ready to go fo= r > RC3. > > I resolved the other issues I created. > > =E2=80=93 Ufuk > > On 26 January 2017 at 22:16:26, Gyula F=C3=B3ra (gyfora@apache.org) wrote= : > > Hi, > > > > Aside from the issues mentioned above I have some good news as well. > > > > I have finished porting and started testing one of our major production > > jobs (RBea) on 1.2 and everything seems to run well so far, with > > savepoints, rescaling, externalized checkpoints, metrics etc. on YARN. > > > > In this job I use, windowing, RocksDB state, iterations, timers, > broadcast > > states, repartitionable operator states etc. and everything seems to be > > working extremely well under normal circumstances. > > > > So far I mostly ran sunny day tests but I will continue testing with > larger > > load and some failure scenarios. I will keep you posted. > > > > Great job! > > Gyula > > > > > > > > Robert Metzger ezt =C3=ADrta (id=C5=91pont: 2017. jan. 26., Cs, > > 21:28): > > > > Damn. I really hoped that this RC goes through. > > > > I propose to keep the RC2 open until we've fixed all issues mentioned > here > > and to get some more testing feedback. > > > > > > > > On Thu, Jan 26, 2017 at 8:06 PM, Stephan Ewen wrote: > > > > > @Till - I think that FLINK-5667 is a blocker > > > > > > Good catch finding it! > > > > > > On Thu, Jan 26, 2017 at 7:51 PM, Till Rohrmann > > > wrote: > > > > > > > I have found another problem: Under certain circumstances Flink can > lose > > > > state data by completing an invalid checkpoint. > > > > https://issues.apache.org/jira/browse/FLINK-5667. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Thu, Jan 26, 2017 at 6:27 PM, Till Rohrmann > > > > wrote: > > > > > > > > > Robert also found an issue that pending checkpoint files are not > > > properly > > > > > cleaned up: https://issues.apache.org/jira/browse/FLINK-5660. To > my > > > > > surprise, the issue was already fixed in 1.1.4 so I guess I've > > > forgotten > > > > to > > > > > forward port the fix. There is a pending PR to fix it. The fix > could > > > also > > > > > be part of a 1.2.1 release. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Thu, Jan 26, 2017 at 6:04 PM, Ufuk Celebi wrote: > > > > > > > > > >> I ran some tests and found the following issues: > > > > >> > > > > >> https://issues.apache.org/jira/browse/FLINK-5663: Checkpoint > fails > > > > >> because of closed registry > > > > >> =3D> This happened a couple of times for the first checkpoints a= fter > > > > >> submitting a job. If it happened on every submission I would > > > > >> definitely make this a blocker, but I happen to run into it in > like 3 > > > > >> out of 10 job submission. What do we make of this? > > > > >> > > > > >> https://issues.apache.org/jira/browse/FLINK-5665: When the > failures > > > > >> happened, I also had some lingering 0-byte files. > > > > >> > > > > >> https://issues.apache.org/jira/browse/FLINK-5664: I also found > the > > > > >> logging of the RocksDB backend a little noisy (for my local setu= p > at > > > > >> least with many tasks per TM and low checkpointing interval.) > > > > >> > > > > >> All in all, I'm not sure if we want to make these a blocker or > not. > > > > >> I'm fine both ways with a follow up 1.2.1 release. > > > > >> > > > > >> =3D=3D=3D > > > > >> > > > > >> - Verified signatures and checksums > > > > >> - Checked out the Java quickstarts and ran the jobs > > > > >> - All poms point to 1.2.0 > > > > >> - Migrated multiple jobs via savepoint from 1.1.4 to 1.2.0 with > Kryo > > > > >> types, session windows (w/o lateness), operator and keyed state > for > > > > >> all three backends > > > > >> - Rescaled the same jobs from 1.2.0 savepoints with all three > > backends > > > > >> - Verified the "migration namespace serializer" fix > > > > >> - Ran streaming state machine with Kafka source, RocksDB backend > and > > > > >> master and worker failures (standalone cluster) > > > > >> > > > > >> On Wed, Jan 25, 2017 at 9:14 PM, Robert Metzger > > > > >> wrote: > > > > >> > Dear Flink community, > > > > >> > > > > > >> > Please vote on releasing the following candidate as Apache Fli= nk > > > > version > > > > >> > 1.2.0. > > > > >> > > > > > >> > The commit to be voted on: > > > > >> > 8b5b6a8b (http://git-wip-us.apache.org/repos/asf/flink/commit/ > > > > 8b5b6a8b) > > > > >> > > > > > >> > Branch: > > > > >> > release-1.2.0-rc2 > > > > >> > (https://git1-us-west.apache.org/repos/asf/flink/repo?p=3Dflin > > > > >> > k.git;a=3Dshortlog;h=3Drefs/heads/release-1.2.0-rc2) > > > > >> > > > > > >> > The release artifacts to be voted on can be found at: > > > > >> > *http://people.apache.org/~rmetzger/flink-1.2.0-rc2/ > > > > >> > * > > > > >> > > > > > >> > The release artifacts are signed with the key with fingerprint > > > > D9839159: > > > > >> > http://www.apache.org/dist/flink/KEYS > > > > >> > > > > > >> > The staging repository for this release can be found at: > > > > >> > *https://repository.apache.org/content/repositories/ > > > > orgapacheflink-1113 > > > > >> > > > > orgapacheflink-1113 > > > > >> >* > > > > >> > > > > > >> > ------------------------------------------------------------- > > > > >> > > > > > >> > I would like to keep Friday as the target release time. Please > let > > > me > > > > >> know > > > > >> > if you want me to move the deadline to Monday if you need more > time > > > of > > > > >> the > > > > >> > testing. > > > > >> > > > > > >> > The vote ends on Friday, January 27, 2017, 6pm CET. > > > > >> > > > > > >> > Please test the release rather now than on Friday morning, to = be > > > able > > > > to > > > > >> > cancel it as early as possible. > > > > >> > For making the testing easier, I've created this document to > track > > > > what > > > > >> has > > > > >> > already been tested and what needs to be tested: > > > > https://docs.google.co > > > > >> > m/document/d/1MX-8l9RrLly3UmZMODHBnuZUrK_n-DGIBLjFKyCrTAs/ > > > > >> edit?usp=3Dsharing > > > > >> > Feel free to add more tests or change existing ones. > > > > >> > > > > > >> > [ ] +1 Release this package as Apache Flink 1.2.0 > > > > >> > [ ] -1 Do not release this package, because ... > > > > >> > > > > > > > > > > > > > > > > > > > > > --94eb2c189b96fce01e05471293b3--