Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: <9633659C9F9D924DB69F4F53CDF42743BD1D2F@kasasiserver>
References: <9633659C9F9D924DB69F4F53CDF42743BD1CFE@kasasiserver>
 <CAC27z=P=OjkWOh8QT_xGLmpf-qoTUBmun86DDFSJUKD7=4hfmQ@mail.gmail.com> <9633659C9F9D924DB69F4F53CDF42743BD1D2F@kasasiserver>
From: Till Rohrmann <trohrmann@apache.org>
Date: Tue, 2 Aug 2016 22:26:32 +0800
Message-ID: <CAC27z=NTz+3Nsfb_+6zg2EN5Az4n3OXSO1tY3XuJOktLqz7Wdw@mail.gmail.com>
Subject: Re: partial savepoints/combining savepoints
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=94eb2c06fb2ee502800539178002
archived-at: Tue, 02 Aug 2016 14:26:37 -0000

--94eb2c06fb2ee502800539178002
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Claudia,

1) At the moment the offset information will be written to the ZooKeeper
quorum used by Kafka as well as to the savepoint. Reading the savepoint is
not so easy to do since you would need to know the internal representation
of the savepoint. But you could try to read the Kafka offsets from
ZooKeeper.

2) That depends a little bit on the deployment and the size of the job. Are
you using a yarn session or a standalone cluster? Then the task manager
should already be registered at the job manager and the deployment for each
task should be in the milli second range. If you start a new yarn
application for a flink job (per job cluster), then it might take a bit
longer depending on how long it takes to allocate the requested resources
by Yarn. But once this is done, the deployment for a task should be in the
sub second range.

3) If you want to keep the different Flink jobs separated, then you should
submit them separately to a Flink cluster or start a Flink cluster per job
(e.g. with Yarn). I don't think that this a bad architecture if you want to
fulfil these requirements. However, I'm not sure whether merging and
splitting savepoints will be implemented anytime soon.

Actually we're currently working on improving Flink's functionality to be
started with a dedicated job. This means that you start a job manager which
has already the job jar in its classpath and directly starts executing the
contained job. This will be helpful for deployment scenarios how they
appear when using docker images, for example. I could imagine that this
could be helpful for your use case as well.

Cheers,
Till

On Mon, Aug 1, 2016 at 10:40 PM, Claudia Wegmann <c.wegmann@kasasi.de>
wrote:

> Hi Till,
>
>
>
> thanks for the quick reply. Too bad, I thought I was on the right track
> with savepoints here.
>
>
>
> Some follow-up questions:
>
>
>
> 1.)    Can I do the whole thing of transferring state and the position in
> the Kafka topic manually for one stream? In other words: is this
> information accessible easily?
>
> 2.)    In any case I would need to stop the running job, change the
> topology (e.g. the number of streams in the program) and resume processin=
g.
> Can you name the overhead of time coming from stopping and starting a Fli=
nk
> job?
>
> 3.)    I=E2=80=99m aware of the upcoming feature for scaling in and out. =
But I
> don=E2=80=99t quite see, how this will help me with different services.
> I thought of each service having its own Flink instance/cluster. I would
> commit this service as one job to the dedicated Flink containing all the
> necessary streams and computations. Is this a bad architecture?
> Would it be better to have one big Flink cluster and commit one big Job,
> which contains all the streams? (As I got to know, committing multiple jo=
bs
> to one Flink instance is not recommended).
> To be honest, I=E2=80=99m not quite there to totally understand the diffe=
rent
> deployment options of Flink and how to bring them together with a
> microservice architecture where I have a service packed as a JAR-File and
> wanting to be able to just deploy this JAR-File. I thought of this servic=
e
> containing Flink and then start the JobManager and some TaskManagers from
> this service and deploy itself as the Flink job with a dedicated entry
> point. Is this a good idea? Or is it even possible?
>
>
>
> Thanks in advance,
>
> Claudia
>
>
>
> *Von:* Till Rohrmann [mailto:trohrmann@apache.org]
> *Gesendet:* Montag, 1. August 2016 16:21
> *An:* user@flink.apache.org
> *Betreff:* Re: partial savepoints/combining savepoints
>
>
>
> Hi Claudia,
>
>
>
> unfortunately neither taking partial savepoints nor combining multiple
> savepoints into one savepoint is currently supported by Flink.
>
>
>
> However, we're currently working on dynamic scaling which will allow to
> adjust the parallelism of your Flink job. This helps you to scale in/out
> depending on the workload of your job. However, you would only be able to
> scale within a single Flink job and not across Flink jobs.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Mon, Aug 1, 2016 at 9:49 PM, Claudia Wegmann <c.wegmann@kasasi.de>
> wrote:
>
> Hey everyone,
>
>
>
> I=E2=80=99ve got some questions regarding savepoints in Flink. I have the
> following situation:
>
>
>
> There is a microservice that reads data from Kafka topics, creates Flink
> streams from this data and does different computations/pattern matching
> workloads. If the overall workload for this service becomes too big, I wa=
nt
> to start a new instance of this service and share the work between the
> running services. To accomplish that, I thought about using Flinks
> savepoint mechanism. But there are some open questions:
>
>
>
> 1.)    Can I combine two or more savepoints in one program?
> Think of two services already running. Now I=E2=80=99m starting up a thir=
d
> service. The new one would get savepoints from the already running
> services. It than would continue computation of some streams while the
> other services would discard calculation on these streams now calculated =
by
> the new service. So, is it possible to combine two or more savepoints in
> one program?
>
> 2.)    Another approach I could think of for accomplishing the
> introduction of a new service would be, to just take a savepoint of the
> streams that change service. Can I only take a savepoint of a part of the
> running job?
>
> Thanks for your comments and best wishes,
>
> Claudia
>
>
>

--94eb2c06fb2ee502800539178002
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Claudia,<div><br></div><div>1) At the moment the offset=
 information will be written to the ZooKeeper quorum used by Kafka as well =
as to the savepoint. Reading the savepoint is not so easy to do since you w=
ould need to know the internal representation of the savepoint. But you cou=
ld try to read the Kafka offsets from ZooKeeper.</div><div><br></div><div>2=
) That depends a little bit on the deployment and the size of the job. Are =
you using a yarn session or a standalone cluster? Then the task manager sho=
uld already be registered at the job manager and the deployment for each ta=
sk should be in the milli second range. If you start a new yarn application=
 for a flink job (per job cluster), then it might take a bit longer dependi=
ng on how long it takes to allocate the requested resources by Yarn. But on=
ce this is done, the deployment for a task should be in the sub second rang=
e.</div><div><br></div><div>3) If you want to keep the different Flink jobs=
 separated, then you should submit them separately to a Flink cluster or st=
art a Flink cluster per job (e.g. with Yarn). I don&#39;t think that this a=
 bad architecture if you want to fulfil these requirements. However, I&#39;=
m not sure whether merging and splitting savepoints will be implemented any=
time soon.</div><div><br></div><div>Actually we&#39;re currently working on=
 improving Flink&#39;s functionality to be started with a dedicated job. Th=
is means that you start a job manager which has already the job jar in its =
classpath and directly starts executing the contained job. This will be hel=
pful for deployment scenarios how they appear when using docker images, for=
 example. I could imagine that this could be helpful for your use case as w=
ell.</div><div><br></div><div>Cheers,</div><div>Till</div></div><div class=
=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Aug 1, 2016 at 10:4=
0 PM, Claudia Wegmann <span dir=3D"ltr">&lt;<a href=3D"mailto:c.wegmann@kas=
asi.de" target=3D"_blank">c.wegmann@kasasi.de</a>&gt;</span> wrote:<br><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #c=
cc solid;padding-left:1ex">


<div lang=3D"DE" link=3D"blue" vlink=3D"purple">
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif">Hi Till,<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-=
family:&quot;Calibri&quot;,sans-serif">thanks for the quick reply. Too bad,=
 I thought I was on the right track with savepoints here.<u></u><u></u></sp=
an></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-=
family:&quot;Calibri&quot;,sans-serif"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-=
family:&quot;Calibri&quot;,sans-serif">Some follow-up questions:<u></u><u><=
/u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-=
family:&quot;Calibri&quot;,sans-serif"><u></u>=C2=A0<u></u></span></p>
<p><u></u><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-family:&quot;=
Calibri&quot;,sans-serif"><span>1.)<span style=3D"font:7.0pt &quot;Times Ne=
w Roman&quot;">=C2=A0=C2=A0=C2=A0
</span></span></span><u></u><span lang=3D"EN-GB" style=3D"font-size:11.0pt;=
font-family:&quot;Calibri&quot;,sans-serif">Can I do the whole thing of tra=
nsferring state and the position in the Kafka topic manually for one stream=
? In other words:
 is this information accessible easily?<br>
<br>
<u></u><u></u></span></p>
<p><u></u><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-family:&quot;=
Calibri&quot;,sans-serif"><span>2.)<span style=3D"font:7.0pt &quot;Times Ne=
w Roman&quot;">=C2=A0=C2=A0=C2=A0
</span></span></span><u></u><span lang=3D"EN-GB" style=3D"font-size:11.0pt;=
font-family:&quot;Calibri&quot;,sans-serif">In any case I would need to sto=
p the running job, change the topology (e.g. the number of streams in the p=
rogram) and resume
 processing. Can you name the overhead of time coming from stopping and sta=
rting a Flink job?<br>
<br>
<u></u><u></u></span></p>
<p><u></u><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-family:&quot;=
Calibri&quot;,sans-serif"><span>3.)<span style=3D"font:7.0pt &quot;Times Ne=
w Roman&quot;">=C2=A0=C2=A0=C2=A0
</span></span></span><u></u><span lang=3D"EN-GB" style=3D"font-size:11.0pt;=
font-family:&quot;Calibri&quot;,sans-serif">I=E2=80=99m aware of the upcomi=
ng feature for scaling in and out. But I don=E2=80=99t quite see, how this =
will help me with different services.<br>
I thought of each service having its own Flink instance/cluster. I would co=
mmit this service as one job to the dedicated Flink containing all the nece=
ssary streams and computations. Is this a bad architecture?<br>
Would it be better to have one big Flink cluster and commit one big Job, wh=
ich contains all the streams? (As I got to know, committing multiple jobs t=
o one Flink instance is not recommended).<br>
To be honest, I=E2=80=99m not quite there to totally understand the differe=
nt deployment options of Flink and how to bring them together with a micros=
ervice architecture where I have a service packed as a JAR-File and wanting=
 to be able to just deploy this JAR-File.
 I thought of this service containing Flink and then start the JobManager a=
nd some TaskManagers from this service and deploy itself as the Flink job w=
ith a dedicated entry point. Is this a good idea? Or is it even possible?<u=
></u><u></u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-=
family:&quot;Calibri&quot;,sans-serif"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-=
family:&quot;Calibri&quot;,sans-serif">Thanks in advance,<u></u><u></u></sp=
an></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-=
family:&quot;Calibri&quot;,sans-serif">Claudia<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB" style=3D"font-size:11.0pt;font-=
family:&quot;Calibri&quot;,sans-serif"><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><b><span lang=3D"EN-GB" style=3D"font-size:11.0pt;fo=
nt-family:&quot;Calibri&quot;,sans-serif">Von:</span></b><span lang=3D"EN-G=
B" style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"> T=
ill Rohrmann [mailto:<a href=3D"mailto:trohrmann@apache.org" target=3D"_bla=
nk">trohrmann@apache.org</a>]
<br>
<b>Gesendet:</b> Montag, 1. August 2016 16:21<br>
<b>An:</b> <a href=3D"mailto:user@flink.apache.org" target=3D"_blank">user@=
flink.apache.org</a><br>
<b>Betreff:</b> </span><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,sans-serif">Re: partial savepoints/combining savepoints<u></u><=
u></u></span></p><div><div class=3D"h5">
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<p class=3D"MsoNormal">Hi Claudia,<u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">unfortunately neither taking partial savepoints nor =
combining multiple savepoints into one savepoint is currently supported by =
Flink.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">However, we&#39;re currently working on dynamic scal=
ing which will allow to adjust the parallelism of your Flink job. This help=
s you to scale in/out depending on the workload of your job. However, you w=
ould only be able to scale within a single
 Flink job and not across Flink jobs.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Cheers,<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Till<u></u><u></u></p>
</div>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<p class=3D"MsoNormal">On Mon, Aug 1, 2016 at 9:49 PM, Claudia Wegmann &lt;=
<a href=3D"mailto:c.wegmann@kasasi.de" target=3D"_blank">c.wegmann@kasasi.d=
e</a>&gt; wrote:<u></u><u></u></p>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0c=
m 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<p class=3D"MsoNormal">Hey everyone,<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB">I=E2=80=99ve got some questions=
 regarding savepoints in Flink. I have the following situation:</span><u></=
u><u></u></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB">=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB">There is a microservice that re=
ads data from Kafka topics, creates Flink streams from this data and does d=
ifferent computations/pattern matching workloads. If
 the overall workload for this service becomes too big, I want to start a n=
ew instance of this service and share the work between the running services=
. To accomplish that, I thought about using Flinks savepoint mechanism. But=
 there are some open questions:</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB">=C2=A0</span><u></u><u></u></p>
<p style=3D"margin-bottom:12.0pt"><span lang=3D"EN-GB">1.)</span><span lang=
=3D"EN-GB" style=3D"font-size:7.0pt">=C2=A0=C2=A0=C2=A0
</span><span lang=3D"EN-GB">Can I combine two or more savepoints in one pro=
gram?<br>
Think of two services already running. Now I=E2=80=99m starting up a third =
service. The new one would get savepoints from the already running services=
. It than would continue computation of some streams while the other servic=
es would discard calculation on these streams
 now calculated by the new service. So, is it possible to combine two or mo=
re savepoints in one program?</span><u></u><u></u></p>
<p style=3D"margin-bottom:12.0pt"><span lang=3D"EN-GB">2.)</span><span lang=
=3D"EN-GB" style=3D"font-size:7.0pt">=C2=A0=C2=A0=C2=A0
</span><span lang=3D"EN-GB">Another approach I could think of for accomplis=
hing the introduction of a new service would be, to just take a savepoint o=
f the streams that change service. Can I only take a savepoint of a part of=
 the running job?</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB">Thanks for your comments and be=
st wishes,</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span lang=3D"EN-GB">Claudia</span><u></u><u></u></p=
>
</div>
</div>
</blockquote>
</div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
</div></div></div>
</div>

</blockquote></div><br></div>

--94eb2c06fb2ee502800539178002--