Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 53542200C46 for ; Wed, 29 Mar 2017 10:23:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 51DF3160B8A; Wed, 29 Mar 2017 08:23:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6CAD0160B7C for ; Wed, 29 Mar 2017 10:23:45 +0200 (CEST) Received: (qmail 66245 invoked by uid 500); 29 Mar 2017 08:23:43 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 65786 invoked by uid 99); 29 Mar 2017 08:23:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Mar 2017 08:23:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 59EFEC6A8E; Wed, 29 Mar 2017 08:23:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id uZIHG2szwGvB; Wed, 29 Mar 2017 08:23:40 +0000 (UTC) Received: from mail-yw0-f171.google.com (mail-yw0-f171.google.com [209.85.161.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id AFB005F1B3; Wed, 29 Mar 2017 08:23:39 +0000 (UTC) Received: by mail-yw0-f171.google.com with SMTP id i203so5542030ywc.3; Wed, 29 Mar 2017 01:23:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Qy6v3zrUVHkyai1R0t0BRhPl8OK7MPYmRfc6UfUXyvM=; b=TtXXzNmsADvC/qjpfaKFHysw11U4uILdgcli5IuxanJfvYL0l2+jZ7HMKj9LMjfwex H97cz9JkjNS569161qf70a8NE3o8ZtuPY/knWjBtezHCun3lJk5v5uN8UCVi0+gmnjUd JvZatnkYpt/9yE5BoNwReT4ay3FcJKTM+MVQa4sNTThi4IYJDJJQ5Dyfg4emty8+xUUU pChoc/dmyiWuRW9DCNrosTqLUMPdmLU6+ymUsBn51vk3Y2tvH0KNbP40capoyOyMoBq6 kM2ZA2A06DW8Nt+zrwMylfEOzUBzHE0sZa6lAJWeIVGS0lehSlX+cenK2+bzMz/aJx5J k3Hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Qy6v3zrUVHkyai1R0t0BRhPl8OK7MPYmRfc6UfUXyvM=; b=fuwApuWldgVgVZNxqoBFo76P0rQ6Yke4lcELuRQvLrQraavzCyNY0L5EyZNX49wOm2 i2lYVsWemyaUUY0SFXgxoJD14XvlskSqBBukIn35oF8gbMqDH87ndRyrLGQAQ1imHHTa 5awcGH6vVPZB6BermvvejCrHAcEegezeGxTJvRgLZBLrAoDr9yMwbEIGPQhY+9eUkGaW 6kP4iZFwr80l/5addogpBzqxEN1lCH7BaZVhciDxUeYc9RrP7z/PsJ2CZfAGVZZjcmk+ cThYGKQ5/L0XIw1iwv+rGLafRY4uKk7SK8QovueX6/c1qxEJmwgf09+/s5bz10IIsj4b AJOw== X-Gm-Message-State: AFeK/H1eRh4C2OkDI66Qkl6uGgfPt49fhCu0J5bsaO/Tz4qzW6bsSP6VCRg/M1T2096saljXSJ5gZNlsAH9JGg== X-Received: by 10.129.53.200 with SMTP id c191mr23766911ywa.205.1490775819234; Wed, 29 Mar 2017 01:23:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.129.120.216 with HTTP; Wed, 29 Mar 2017 01:22:58 -0700 (PDT) In-Reply-To: References: From: Till Rohrmann Date: Wed, 29 Mar 2017 10:22:58 +0200 Message-ID: Subject: Re: Figuring out when a job has successfully restored state To: =?UTF-8?Q?Gyula_F=C3=B3ra?= Cc: "user@flink.apache.org" , "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=001a114218d82538e3054bda4bdb archived-at: Wed, 29 Mar 2017 08:23:46 -0000 --001a114218d82538e3054bda4bdb Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Gyula, there exists a related issue [1]. Fixing this issue will move the state restoration in the state DEPLOYING. This means that when you see a task being in state RUNNING, then it will have restored all of its eager state. [1] https://issues.apache.org/jira/browse/FLINK-4714 Cheers, Till On Tue, Mar 28, 2017 at 10:55 AM, Gyula F=C3=B3ra wr= ote: > Hi, > > Another thought I had last night, maybe we could have another state for > recovering jobs in the future. > Deploying -> Recovering -> Running > This recovering state might only be applicable for state backends that > have to be restored before processing can start, lazy state backends (lik= e > external databases) might go into processing state "directly". > > What do you think? (I'm ccing dev) > Gyula > > Gyula F=C3=B3ra ezt =C3=ADrta (id=C5=91pont: 2017.= m=C3=A1rc. 27., H, > 17:06): > >> Hi all, >> >> I am trying to figure out the best way to tell when a job has >> successfully restored all state and started process. >> >> My first idea was to check the rest api and the number of processed byte= s >> for each parallel operator and if thats greater than 0, it started. >> Unfortunately this logic fails if the operator doesnt receive any input = for >> some time. >> >> Do we have any info like this exposed somewhere in a nicely queryable wa= y? >> >> Thanks, >> Gyula >> > --001a114218d82538e3054bda4bdb Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Gyula,

there exists a related issue = [1]. Fixing this issue will move the state restoration in the state DEPLOYI= NG. This means that when you see a task being in state RUNNING, then it wil= l have restored all of its eager state.

[1]=C2=A0<= a href=3D"https://issues.apache.org/jira/browse/FLINK-4714">https://issues.= apache.org/jira/browse/FLINK-4714

Cheers,
Till

On Tue, Mar 28, 2017 at 10:55 AM, Gyula F=C3=B3ra &l= t;gyula.fora@gmai= l.com> wrote:
Hi,

Another thought I had last night, maybe we= could have another state for recovering jobs in the future.
Deploying = -> Recovering -> Running
This recovering state might only b= e applicable for state backends that have to be restored before processing = can start, lazy state backends (like external databases) might go into proc= essing state "directly".

What do you thi= nk? (I'm ccing dev)
Gyula

Gyula F=C3=B3ra <gyula.fora@gmail.com> ezt =C3=ADrta (i= d=C5=91pont: 2017. m=C3=A1rc. 27., H, 17:06):
Hi all,=

I am tryin= g to figure out the best way to tell when a job has successfully restored a= ll state and started process.

My first idea was to check the rest api and the nu= mber of processed bytes for each parallel operator and if thats greater tha= n 0, it started. Unfortunately this logic fails if the operator doesnt rece= ive any input for some time.=C2=A0

Do we have any info like this exposed somewhe= re in a nicely queryable way?

Thanks,
Gyula

--001a114218d82538e3054bda4bdb--