From user-return-20823-archive-asf-public=cust-asf.ponee.io@flink.apache.org Mon Jun 25 16:23:03 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id F2DDC180627 for ; Mon, 25 Jun 2018 16:23:01 +0200 (CEST) Received: (qmail 24565 invoked by uid 500); 25 Jun 2018 14:23:00 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 24554 invoked by uid 99); 25 Jun 2018 14:23:00 -0000 Received: from mail-relay.apache.org (HELO mailrelay2-lw-us.apache.org) (207.244.88.137) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jun 2018 14:23:00 +0000 Received: from [192.168.2.105] (p54B97FF3.dip0.t-ipconnect.de [84.185.127.243]) by mailrelay2-lw-us.apache.org (ASF Mail Server at mailrelay2-lw-us.apache.org) with ESMTPSA id 49F1445C; Mon, 25 Jun 2018 14:22:56 +0000 (UTC) Subject: Re: Few question about upgrade from 1.4 to 1.5 flink ( some very basic ) To: Vishal Santoshi , Fabian Hueske Cc: user References: From: Chesnay Schepler Message-ID: <04a172f2-b963-07ce-828b-ab21d5ca896c@apache.org> Date: Mon, 25 Jun 2018 16:22:53 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------C33656107E1ABCE339E6AD0E" Content-Language: en-US This is a multi-part message in MIME format. --------------C33656107E1ABCE339E6AD0E Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit The watermark issue is know and will be fixed in 1.5.1 On 25.06.2018 15:03, Vishal Santoshi wrote: > Thank you.... > > One addition > > I do not see WM info on the UI ( Attached ) > > Is this a know issue. The same pipe on our production has the WM ( In > fact never had an issue with Watermarks not appearing ) . Am I > missing something ? > > On Mon, Jun 25, 2018 at 4:15 AM, Fabian Hueske > wrote: > > Hi Vishal, > > 1. I don't think a rolling update is possible. Flink 1.5.0 changed > the process orchestration and how they communicate. IMO, the way > to go is to start a Flink 1.5.0 cluster, take a savepoint on the > running job, start from the savepoint on the new cluster and shut > the old job down. > 2. Savepoints should be compatible. > 3. You can keep the slot configuration as before. > 4. As I said before, mixing 1.5 and 1.4 processes does not work > (or at least, it was not considered a design goal and nobody paid > attention that it is possible). > > Best, Fabian > > > 2018-06-23 13:38 GMT+02:00 Vishal Santoshi > >: > > > 1. > Can or has any one done a rolling upgrade from 1.4 to 1.5 ? > I am not sure we can. It seems that JM cannot recover jobs > with this exception > > Caused by: java.io.InvalidClassException: > org.apache.flink.runtime.jobgraph.tasks.CheckpointCoordinatorConfiguration; > local class incompatible: stream classdesc serialVersionUID = > -647384516034982626, local class serialVersionUID = 2 > > > > > 2. > Does SP on 1.4, resume on 1.5 ( pretty basic but no harm > asking ) ? > > > > 3. > https://ci.apache.org/projects/flink/flink-docs-release-1.5/release-notes/flink-1.5.html#update-configuration-for-reworked-job-deployment > > The taskmanager.numberOfTaskSlots: What would be the desired > setting in a stand alone ( non mesos/yarn ) cluster ? > > > 4. I suspend all jobs and establish 1.5 on the JM ( the TMs > are still running with 1.4 ) . JM refuse to start with > > Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net > docker[3395]: > 2018-06-23 11:34:23 ERROR JobManager:116 - Failed to recover > job 454cd84a519f3b50e88bcb378d8a1330. > > Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net > docker[3395]: > java.lang.InstantiationError: > org.apache.flink.runtime.blob.BlobKey > > Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net > docker[3395]: at > sun.reflect.GeneratedSerializationConstructorAccessor51.newInstance(Unknown > Source) > > Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net > docker[3395]: at > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > > Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net > docker[3395]: at > java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:1079) > > Jun > > ..... > > > > Any feedback would be highly appreciated... > > > --------------C33656107E1ABCE339E6AD0E Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
The watermark issue is know and will be fixed in 1.5.1

On 25.06.2018 15:03, Vishal Santoshi wrote:
Thank you....  

One addition

I do not see WM info on the UI  ( Attached ) 

Is this a know issue. The same pipe on our production has the WM ( In fact never had an issue with  Watermarks not appearing ) . Am I missing something ?

On Mon, Jun 25, 2018 at 4:15 AM, Fabian Hueske <fhueske@gmail.com> wrote:
Hi Vishal,

1. I don't think a rolling update is possible. Flink 1.5.0 changed the process orchestration and how they communicate. IMO, the way to go is to start a Flink 1.5.0 cluster, take a savepoint on the running job, start from the savepoint on the new cluster and shut the old job down.
2. Savepoints should be compatible.
3. You can keep the slot configuration as before.
4. As I said before, mixing 1.5 and 1.4 processes does not work (or at least, it was not considered a design goal and nobody paid attention that it is possible).

Best, Fabian


2018-06-23 13:38 GMT+02:00 Vishal Santoshi <vishal.santoshi@gmail.com>:

1.  
Can or has any one  done  a rolling upgrade from 1.4 to 1.5 ?  I am not sure we can. It seems that JM cannot recover jobs with this exception

Caused by: java.io.InvalidClassException: org.apache.flink.runtime.jobgraph.tasks.CheckpointCoordinatorConfiguration; local class incompatible: stream classdesc serialVersionUID = -647384516034982626, local class serialVersionUID = 2




2. 
Does SP on 1.4, resume on 1.5 ( pretty basic but no harm asking ) ?



3. 
https://ci.apache.org/projects/flink/flink-docs-release-1.5/release-notes/flink-1.5.html#update-configuration-for-reworked-job-deployment The taskmanager.numberOfTaskSlots: What would be the desired setting in a stand alone ( non mesos/yarn ) cluster ?


4. I suspend all jobs and establish 1.5 on the JM ( the TMs are still running with 1.4 ) . JM refuse to start  with 

Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: 2018-06-23 11:34:23 ERROR JobManager:116 - Failed to recover job 454cd84a519f3b50e88bcb378d8a1330.

Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: java.lang.InstantiationError: org.apache.flink.runtime.blob.BlobKey

Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at sun.reflect.GeneratedSerializationConstructorAccessor51.newInstance(Unknown Source)

Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:1079)

Jun 

.....



Any feedback would be highly appreciated...




--------------C33656107E1ABCE339E6AD0E--