Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <CA+faj9xyvcjK4HJ7KW7Roq9sKAxQ_zGBZ8Zz_CLwWy2s8Th3Dg@mail.gmail.com>
References: <CA+faj9wrFcC0PzGfdGOqEhhKtFOKFQURdbzWHZw+_Nmsyy4hqA@mail.gmail.com>
 <CA+faj9xyvcjK4HJ7KW7Roq9sKAxQ_zGBZ8Zz_CLwWy2s8Th3Dg@mail.gmail.com>
From: Till Rohrmann <till.rohrmann@gmail.com>
Date: Wed, 29 Mar 2017 10:22:58 +0200
Message-ID: <CAC27z=Ok_FX8pNsMZ6oxuvSTCQpxHEOXscZz7+6jVoZPD0aYYw@mail.gmail.com>
Subject: Re: Figuring out when a job has successfully restored state
To: =?UTF-8?Q?Gyula_F=C3=B3ra?= <gyula.fora@gmail.com>
Cc: "user@flink.apache.org" <user@flink.apache.org>, "dev@flink.apache.org" <dev@flink.apache.org>
Content-Type: multipart/alternative; boundary=001a114218d82538e3054bda4bdb
archived-at: Wed, 29 Mar 2017 08:23:46 -0000

--001a114218d82538e3054bda4bdb
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Gyula,

there exists a related issue [1]. Fixing this issue will move the state
restoration in the state DEPLOYING. This means that when you see a task
being in state RUNNING, then it will have restored all of its eager state.

[1] https://issues.apache.org/jira/browse/FLINK-4714

Cheers,
Till

On Tue, Mar 28, 2017 at 10:55 AM, Gyula F=C3=B3ra <gyula.fora@gmail.com> wr=
ote:

> Hi,
>
> Another thought I had last night, maybe we could have another state for
> recovering jobs in the future.
> Deploying -> Recovering -> Running
> This recovering state might only be applicable for state backends that
> have to be restored before processing can start, lazy state backends (lik=
e
> external databases) might go into processing state "directly".
>
> What do you think? (I'm ccing dev)
> Gyula
>
> Gyula F=C3=B3ra <gyula.fora@gmail.com> ezt =C3=ADrta (id=C5=91pont: 2017.=
 m=C3=A1rc. 27., H,
> 17:06):
>
>> Hi all,
>>
>> I am trying to figure out the best way to tell when a job has
>> successfully restored all state and started process.
>>
>> My first idea was to check the rest api and the number of processed byte=
s
>> for each parallel operator and if thats greater than 0, it started.
>> Unfortunately this logic fails if the operator doesnt receive any input =
for
>> some time.
>>
>> Do we have any info like this exposed somewhere in a nicely queryable wa=
y?
>>
>> Thanks,
>> Gyula
>>
>

--001a114218d82538e3054bda4bdb
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Gyula,<div><br></div><div>there exists a related issue =
[1]. Fixing this issue will move the state restoration in the state DEPLOYI=
NG. This means that when you see a task being in state RUNNING, then it wil=
l have restored all of its eager state.</div><div><br></div><div>[1]=C2=A0<=
a href=3D"https://issues.apache.org/jira/browse/FLINK-4714">https://issues.=
apache.org/jira/browse/FLINK-4714</a></div><div><br></div><div>Cheers,</div=
><div>Till</div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_qu=
ote">On Tue, Mar 28, 2017 at 10:55 AM, Gyula F=C3=B3ra <span dir=3D"ltr">&l=
t;<a href=3D"mailto:gyula.fora@gmail.com" target=3D"_blank">gyula.fora@gmai=
l.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"m=
argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"l=
tr"><div>Hi,</div><div><br></div>Another thought I had last night, maybe we=
 could have another state for recovering jobs in the future.<div>Deploying =
-&gt; Recovering -&gt; Running</div><div>This recovering state might only b=
e applicable for state backends that have to be restored before processing =
can start, lazy state backends (like external databases) might go into proc=
essing state &quot;directly&quot;.</div><div><br></div><div>What do you thi=
nk? (I&#39;m ccing dev)</div><span class=3D"HOEnZb"><font color=3D"#888888"=
><div>Gyula</div></font></span><div><div class=3D"h5"><br><div class=3D"gma=
il_quote"><div dir=3D"ltr">Gyula F=C3=B3ra &lt;<a href=3D"mailto:gyula.fora=
@gmail.com" target=3D"_blank">gyula.fora@gmail.com</a>&gt; ezt =C3=ADrta (i=
d=C5=91pont: 2017. m=C3=A1rc. 27., H, 17:06):<br></div><blockquote class=3D=
"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding=
-left:1ex"><div dir=3D"ltr" class=3D"m_574959324730938152gmail_msg">Hi all,=
<div class=3D"m_574959324730938152gmail_msg"><br class=3D"m_574959324730938=
152gmail_msg"></div><div class=3D"m_574959324730938152gmail_msg">I am tryin=
g to figure out the best way to tell when a job has successfully restored a=
ll state and started process.</div><div class=3D"m_574959324730938152gmail_=
msg"><br class=3D"m_574959324730938152gmail_msg"></div><div class=3D"m_5749=
59324730938152gmail_msg">My first idea was to check the rest api and the nu=
mber of processed bytes for each parallel operator and if thats greater tha=
n 0, it started. Unfortunately this logic fails if the operator doesnt rece=
ive any input for some time.=C2=A0</div><div class=3D"m_574959324730938152g=
mail_msg"><br class=3D"m_574959324730938152gmail_msg"></div><div class=3D"m=
_574959324730938152gmail_msg">Do we have any info like this exposed somewhe=
re in a nicely queryable way?</div><div class=3D"m_574959324730938152gmail_=
msg"><br class=3D"m_574959324730938152gmail_msg"></div><div class=3D"m_5749=
59324730938152gmail_msg">Thanks,</div><div class=3D"m_574959324730938152gma=
il_msg">Gyula</div></div></blockquote></div></div></div></div>
</blockquote></div><br></div>

--001a114218d82538e3054bda4bdb--