flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai" <tzuli...@apache.org>
Subject Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1
Date Tue, 21 Mar 2017 16:22:58 GMT
Update for Flink 1.2.1:

There’s only one PR pending that is LGTM -
https://issues.apache.org/jira/browse/FLINK-6084
Fix for Cassandra connector dropping metrics-core dependency.

We can proceed to create the release candidate very soon :-)
Release 1.1.5 RC1 seems to be in good shape so far, so hopefully we can start voting for 1.2.1
tomorrow.

Also, we’re still lacking a release manager for 1.2.1. Is anyone interested in volunteering
for this release?
If nobody steps up for it before tomorrow, I can also do it.

Cheers,
Gordon

On March 18, 2017 at 12:52:48 AM, Robert Metzger (rmetzger@apache.org) wrote:

I don't think that his issue should be a reason to hold back a bugfix  
release.  
There are workarounds for the problem you are describing. Once we've fixed  
it, we can include it into the next upcoming bugfix release.  

On Fri, Mar 17, 2017 at 4:22 PM, Flavio Pompermaier <pompermaier@okkam.it>  
wrote:  

> I propose to fix https://issues.apache.org/jira/browse/FLINK-6103 before  
> issue a release  
>  
> On Fri, Mar 17, 2017 at 8:12 AM, Ufuk Celebi <uce@apache.org> wrote:  
>  
> > Cool! Thanks for taking care of this Gordon :-)  
> >  
> > On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai  
> > <tzulitai@apache.org> wrote:  
> > > Update for 1.1.5:  
> > > The last fixes for 1.1.5 are in! I will create the RC today and start  
> > the vote.  
> > >  
> > > Cheers,  
> > > Gordon  
> > >  
> > >  
> > > On March 17, 2017 at 1:14:53 AM, Robert Metzger (rmetzger@apache.org)  
> > wrote:  
> > >  
> > > The cassandra connector is probably not usable in Flink 1.2.0. I would  
> > like  
> > > to include a fix in 1.2.1:  
> > > https://issues.apache.org/jira/browse/FLINK-6084  
> > >  
> > > Please let me know if this fix becomes a blocker for the 1.2.1 release.  
> > If  
> > > so, I can validate the fix myself to speed up things.  
> > >  
> > > On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <shijinkui666@163.com>  
> > wrote:  
> > >  
> > >> @Tzu-li(Fordon)Tai  
> > >>  
> > >> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.  
> > >>  
> > >> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <  
> > >> https://github.com/zentol/flink/tree/5650_python_test_debug>  
> > >>  
> > >>  
> > >> > 在 2017年3月16日,上午3:37,Stephan Ewen <sewen@apache.org>
写道:  
> > >> >  
> > >> > Thanks for the update!  
> > >> >  
> > >> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove  
> scheduled  
> > >> > cancel-task from timer queue to prevent memory leaks  
> > >> >  
> > >> > The remaining issue list looks good, but I would say that (5) is 

> > >> optional.  
> > >> > It is not a critical production bug.  
> > >> >  
> > >> >  
> > >> >  
> > >> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <  
> > >> tzulitai@apache.org>  
> > >> > wrote:  
> > >> >  
> > >> >> Thanks a lot for the updates so far everyone!  
> > >> >>  
> > >> >> From the discussion so far, the below is the still unfixed pending
 
> > >> issues  
> > >> >> for 1.1.5 / 1.2.1 release.  
> > >> >>  
> > >> >> Since there’s only one backport for 1.1.5 left, I think having
an  
> RC  
> > for  
> > >> >> 1.1.5 near the end of this week / early next week is very  
> promising,  
> > as  
> > >> >> basically everything is already in.  
> > >> >> I’d be happy to volunteer to help manage the release for 1.1.5,
and  
> > >> >> prepare the RC when it’s ready :)  
> > >> >>  
> > >> >> For 1.2.1, we can leave the pending list here for tracking, and
 
> come  
> > >> back  
> > >> >> to update it in the near future.  
> > >> >>  
> > >> >> If there’s anything I missed, please let me know!  
> > >> >>  
> > >> >>  
> > >> >> =========== Still pending for Flink 1.1.5 ===========  
> > >> >>  
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5701  
> > >> >> Broken at-least-once Kafka producer.  
> > >> >> Status: backport PR pending - https://github.com/apache/  
> > flink/pull/3549  
> > >> .  
> > >> >> Since it is a relatively self-contained change, I expect this
to  
> be a  
> > >> fast  
> > >> >> fix.  
> > >> >>  
> > >> >>  
> > >> >>  
> > >> >> =========== Still pending for Flink 1.2.1 ===========  
> > >> >>  
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5808  
> > >> >> Fix Missing verification for setParallelism and setMaxParallelism
 
> > >> >> Status: PR - https://github.com/apache/flink/pull/3509, review
in  
> > >> progress  
> > >> >>  
> > >> >> (2) https://issues.apache.org/jira/browse/FLINK-5713  
> > >> >> Protect against NPE in WindowOperator window cleanup  
> > >> >> Status: PR - https://github.com/apache/flink/pull/3535, review
 
> > pending  
> > >> >>  
> > >> >> (3) https://issues.apache.org/jira/browse/FLINK-6044  
> > >> >> TypeSerializerSerializationProxy.read() doesn't verify the read
 
> > buffer  
> > >> >> length  
> > >> >> Status: Fixed for master, 1.2 backport pending  
> > >> >>  
> > >> >> (4) https://issues.apache.org/jira/browse/FLINK-5985  
> > >> >> Flink treats every task as stateful (making topology changes 

> > impossible)  
> > >> >> Status: PR - https://github.com/apache/flink/pull/3543, review
in  
> > >> progress  
> > >> >>  
> > >> >> (5) https://issues.apache.org/jira/browse/FLINK-5650  
> > >> >> Flink-python tests taking up too much time  
> > >> >> Status: I think Chesnay currently has some progress with this
one,  
> we  
> > >> can  
> > >> >> see if we want to make this a blocker  
> > >> >>  
> > >> >>  
> > >> >> Cheers,  
> > >> >> Gordon  
> > >> >>  
> > >> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi (shijinkui666@163.com)
 
> > >> wrote:  
> > >> >>  
> > >> >> Can we fix this issue in the 1.2.1:  
> > >> >>  
> > >> >> Flink-python tests cost too long time  
> > >> >> https://issues.apache.org/jira/browse/FLINK-5650 <  
> > >> >> https://issues.apache.org/jira/browse/FLINK-5650>  
> > >> >>  
> > >> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <vladislav.pernin@gmail.com>
 
> > 写道:  
> > >> >>>  
> > >> >>> I just tested in in my reproducer. It works.  
> > >> >>>  
> > >> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <aljoscha@apache.org
 
> >:  
> > >> >>>  
> > >> >>>> I did in fact just open a PR for  
> > >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001 

> > >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger
 
> > and  
> > >> >>>>> allowedLateness  
> > >> >>>>  
> > >> >>>>  
> > >> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
 
> > >> >>>>> Hi,  
> > >> >>>>>  
> > >> >>>>> I would also include the following (not yet resolved)
issue in  
> the  
> > >> >> 1.2.1  
> > >> >>>>> scope :  
> > >> >>>>>  
> > >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001 

> > >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger
 
> > and  
> > >> >>>>> allowedLateness  
> > >> >>>>>  
> > >> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <uce@apache.org>:
 
> > >> >>>>>  
> > >> >>>>>> Big +1 Gordon!  
> > >> >>>>>>  
> > >> >>>>>> I think (10) is very critical to have in 1.2.1.
 
> > >> >>>>>>  
> > >> >>>>>> – Ufuk  
> > >> >>>>>>  
> > >> >>>>>>  
> > >> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
 
> > >> >>>>>> <s.richter@data-artisans.com> wrote:  
> > >> >>>>>>> Hi,  
> > >> >>>>>>>  
> > >> >>>>>>> I would suggest to also include in 1.2.1:
 
> > >> >>>>>>>  
> > >> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044
<  
> > >> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>
 
> > >> >>>>>>> Replaces unintentional calls to InputStream#read(…)
with the  
> > >> intended  
> > >> >>>>>>> and correct InputStream#readFully(…)  
> > >> >>>>>>> Status: PR  
> > >> >>>>>>>  
> > >> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985
<  
> > >> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>
 
> > >> >>>>>>> Flink 1.2 was creating state handles for stateless
tasks which  
> > >> caused  
> > >> >>>>>> trouble  
> > >> >>>>>>> at restore time for users that wanted to do
some changes that  
> > only  
> > >> >>>>>> include  
> > >> >>>>>>> stateless operators to their topology.  
> > >> >>>>>>> Status: PR  
> > >> >>>>>>>  
> > >> >>>>>>>  
> > >> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann
<  
> > >> trohrmann@apache.org  
> > >> >>>>> :  
> > >> >>>>>>>>  
> > >> >>>>>>>> Thanks for kicking off the discussion
Tzu-Li. I'd like to add  
> > the  
> > >> >>>>>> following  
> > >> >>>>>>>> issues which have already been merged
into the 1.2-release  
> and  
> > >> >>>>>> 1.1-release  
> > >> >>>>>>>> branch:  
> > >> >>>>>>>>  
> > >> >>>>>>>> 1.2.1:  
> > >> >>>>>>>>  
> > >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
 
> > >> >>>>>>>> Hardens the checkpoint recovery in case
of corrupted  
> ZooKeeper  
> > >> data.  
> > >> >>>>>>>> Corrupted checkpoints will now be skipped.
 
> > >> >>>>>>>> Status: Merged  
> > >> >>>>>>>>  
> > >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
 
> > >> >>>>>>>> Hardens the checkpoint recovery in case
that we cannot  
> retrieve  
> > >> the  
> > >> >>>>>>>> completed checkpoint from the meta data
state handle  
> retrieved  
> > >> from  
> > >> >>>>>>>> ZooKeeper. This can, for example, happen
if the meta data is  
> > >> >>>> deleted.  
> > >> >>>>>>>> Checkpoints with unretrievable state handles
are skipped.  
> > >> >>>>>>>> Status: Merged  
> > >> >>>>>>>>  
> > >> >>>>>>>> 1.1.5:  
> > >> >>>>>>>>  
> > >> >>>>>>>>  
> > >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
 
> > >> >>>>>>>> Hardens the checkpoint recovery in case
of corrupted  
> ZooKeeper  
> > >> data.  
> > >> >>>>>>>> Corrupted checkpoints will now be skipped.
 
> > >> >>>>>>>> Status: Merged  
> > >> >>>>>>>>  
> > >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
 
> > >> >>>>>>>> Hardens the checkpoint recovery in case
that we cannot  
> retrieve  
> > >> the  
> > >> >>>>>>>> completed checkpoint from the meta data
state handle  
> retrieved  
> > >> from  
> > >> >>>>>>>> ZooKeeper. This can, for example, happen
if the meta data is  
> > >> >>>> deleted.  
> > >> >>>>>>>> Checkpoints with unretrievable state handles
are skipped.  
> > >> >>>>>>>> Status: Merged  
> > >> >>>>>>>>  
> > >> >>>>>>>> Cheers,  
> > >> >>>>>>>> Till  
> > >> >>>>>>>>  
> > >> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li
(Gordon) Tai <  
> > >> >>>>>> tzulitai@apache.org>  
> > >> >>>>>>>> wrote:  
> > >> >>>>>>>>  
> > >> >>>>>>>>> Hi all!  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> I would like to start a discussion
for the next bugfix  
> release  
> > >> for  
> > >> >>>>>> 1.1.x  
> > >> >>>>>>>>> and 1.2.x.  
> > >> >>>>>>>>> There’s been quite a few critical
fixes for bugs in both the  
> > >> >>>> releases  
> > >> >>>>>>>>> recently, and I think they deserve
a bugfix release soon.  
> > >> >>>>>>>>> Most of the bugs were reported by
users.  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> I’m starting the discussion for
both bugfix releases because  
> > most  
> > >> >>>> fixes  
> > >> >>>>>>>>> span both releases (almost identical).
 
> > >> >>>>>>>>> Of course, the actual RC votes and
RC creation process  
> doesn’t  
> > >> >>>> have to  
> > >> >>>>>> be  
> > >> >>>>>>>>> started together.  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> Here’s an overview of what’s been
collected so far, for both  
> > >> bugfix  
> > >> >>>>>>>>> releases -  
> > >> >>>>>>>>> (it’s a list of what I’m aware
of so far, and may be missing  
> > >> stuff;  
> > >> >>>>>> please  
> > >> >>>>>>>>> append and bring to attention as necessary
:-) )  
> > >> >>>>>>>>>  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> For Flink 1.2.1:  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
 
> > >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer
are not checked  
> on  
> > >> >>>>>> checkpoints.  
> > >> >>>>>>>>> This compromises the producer’s
at-least-once guarantee.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
 
> > >> >>>>>>>>> Do not check Kerberos credentials
for non-Kerberos  
> > >> authentications.  
> > >> >>>>>> MapR  
> > >> >>>>>>>>> users are affected by this, and cannot
submit Flink on YARN  
> > jobs  
> > >> >>>> on a  
> > >> >>>>>>>>> secured MapR cluster.  
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528,
one  
> > +1  
> > >> >>>> already  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
 
> > >> >>>>>>>>> Kafka Consumer can lose state if queried
partition list is  
> > >> >>>> incomplete  
> > >> >>>>>> on  
> > >> >>>>>>>>> restore.  
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505,
one  
> > +1  
> > >> >>>> already  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
 
> > >> >>>>>>>>> KryoSerializer may use the wrong classloader
when Kryo’s  
> > >> >>>>>> JavaSerializer is  
> > >> >>>>>>>>> used.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
 
> > >> >>>>>>>>> Fix multi-char delimiters in Batch
InputFormats.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
 
> > >> >>>>>>>>> Set the Scheduler in the ExecutionGraph
via its constructor.  
> > This  
> > >> >>>>>> fixes a  
> > >> >>>>>>>>> bug that causes HA recovery to fail.
 
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>>  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> For Flink 1.1.5:  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
 
> > >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer
are not checked  
> on  
> > >> >>>>>> checkpoints.  
> > >> >>>>>>>>> This compromises the producer’s
at-least-once guarantee.  
> > >> >>>>>>>>> Status: This is already merged for
1.2.1. I would personally  
> > like  
> > >> >>>> to  
> > >> >>>>>>>>> backport the fix for this to 1.1.5
also.  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
 
> > >> >>>>>>>>> Kafka Consumer can lose state if queried
partition list is  
> > >> >>>> incomplete  
> > >> >>>>>> on  
> > >> >>>>>>>>> restore.  
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507,
one  
> > +1  
> > >> >>>> already  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
 
> > >> >>>>>>>>> KryoSerializer may use the wrong classloader
when Kryo’s  
> > >> >>>>>> JavaSerializer is  
> > >> >>>>>>>>> used.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
 
> > >> >>>>>>>>> Fix multi-char delimiters in Batch
InputFormats.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
 
> > >> >>>>>>>>> Set the Scheduler in the ExecutionGraph
via its constructor.  
> > This  
> > >> >>>>>> fixes a  
> > >> >>>>>>>>> bug that causes HA recovery to fail.
 
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
 
> > >> >>>>>>>>> Kafka Consumer (0.9/0.10) threading
model leads problematic  
> > >> >>>>>> cancellation  
> > >> >>>>>>>>> behavior.  
> > >> >>>>>>>>> Status: This fix was already released
in 1.2.0, but never  
> > made it  
> > >> >>>> into  
> > >> >>>>>> the  
> > >> >>>>>>>>> 1.1.x bugfixes. Do we want to backport
this also for 1.1.5?  
> > >> >>>>>>>>>  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> What do you think? From the list so
far, we pretty much  
> > already  
> > >> >>>> have  
> > >> >>>>>>>>> everything in, so I think it would
be nice to aim for RCs by  
> > the  
> > >> >>>> end of  
> > >> >>>>>>>>> this week.  
> > >> >>>>>>>>> Since both bugfix releases cover almost
the same list of  
> > issues,  
> > >> I  
> > >> >>>>>> think  
> > >> >>>>>>>>> it shouldn’t be too hard for us
to kick off both bugfix  
> > releases  
> > >> >>>>>> around the  
> > >> >>>>>>>>> same time.  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> Also FYI, here’s the lists of JIRA
tickets tagged with  
> > "1.2.1” /  
> > >> >>>>>> “1.1.5”  
> > >> >>>>>>>>> as the Fix Versions, and are still
open.  
> > >> >>>>>>>>> We should probably want to check if
there’s anything on  
> there  
> > >> that  
> > >> >>>> we  
> > >> >>>>>>>>> should block on for the releases:
 
> > >> >>>>>>>>>  
> > >> >>>>>>>>> For 1.2.1:  
> > >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
 
> > >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
 
> > >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
 
> > >> 20fixVersion%20%3D%201.2.1  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> For 1.1.5:  
> > >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
 
> > >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
 
> > >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
 
> > >> 20fixVersion%20%3D%201.1.5  
> > >> >>>>>>>  
> > >> >>>>>>  
> > >> >>>>  
> > >> >>>  
> > >> >>  
> > >> >>  
> > >>  
> > >>  
> >  
>  

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message