Mailing-List: contact user-help@beam.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@beam.incubator.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAGco--aFKhdtmXcaVDRaqs9BJnht4yXE116UjDhmAdScV0mw1w@mail.gmail.com>
References: 
 <CAAQArX5-w3SeW5MQLjR1JX-N0BKsmRs8YT4dhK=O8+rYXiU-bg@mail.gmail.com>
	<CAGco--aFKhdtmXcaVDRaqs9BJnht4yXE116UjDhmAdScV0mw1w@mail.gmail.com>
Date: Mon, 25 Apr 2016 23:12:36 -0700
Message-ID: 
 <CAAQArX7f7jRx-oC9XBCGRF1bzJMQrd4AOvgNgdLu+7cEvO58LQ@mail.gmail.com>
Subject: Re: Questions on Beam Pipeline management , monitoring
From: kaniska Mandal <kaniska.mandal@gmail.com>
To: user@beam.incubator.apache.org
Content-Type: multipart/alternative; boundary=001a11406a9cf1c30d05315d2d90

--001a11406a9cf1c30d05315d2d90
Content-Type: text/plain; charset=UTF-8

Hi Max,

Many thanks for the great explanations.

*Few questions regarding 'Execution Strategy of Group of similar /
disparate Beam Pipelines' *

>  Is it a feasible idea to maintain any Load Balancer in front of the
'Beam-Flink Pipeline Executor Process' in order to control rate-limits /
throttling ?

> Should we maintain multiple containers of 'Beam-Flink Pipelines' or a
single container with multiple instances of Beam-Flink Pipeline ?

> Is it a good idea to use any external 'Process Flow Controller' like
Apache NiFi to wire the Beam Pipelines and launch / halt / resume them
programmatically / interactively ?

*More questions related to graceful shutdown and restart*

Currently Flink Pipeline Runner#run() throws Runtime Exception.

> So is it good enough to add a shutdownhook on Beam Pipeline and close
resources like KafkaProducer when Pipeline Job is killed due to Runtime
Exception ?  It would have been better if a custom exception was thrown ,
so that the Job could gracefully handle it !

BTW, I tried calling close() on FlinkKafkaProucer form inside Beam-Flink
Pipeline Runner , but the producer didn't stop.

> I understand we can use monitd to restart a process, but any suggestion
to implement external 'Beam Pipeline Monitoring Agent'  to auto-retry to
restart the Pipeline ?

Let me know if any of the above points sound logical, then I'll go ahead
and create Feature request.

Thanks,
Kaniska

On Mon, Apr 25, 2016 at 11:12 AM, Maximilian Michels <mxm@apache.org> wrote:

> Hi Kaniska,
>
> Not all of these are uniform across all Runners yet but since you have
> previously deployed applications with the Flink Runner, here are my
> answers from a Flink perspective.
>
> ** Shutdown **
>
> For shutting down a Flink pipeline, you can use the "cancel" function:
>
> >./bin/flink cancel <jobId>
>
> When you submit your job in detached mode, e.g., ./bin/flink run -d
> /path/to/jar you get a job id in return which you can use for the
> cancel command. Alternatively, query the running jobs via
>
> >./bin/flink list
>
> Very soon we will have checkpointing of sources/sinks in Beam. That
> would enable you to use Flink's Savepoint feature. Savepoints allow
> you to take a snapshot of your Flink job at a moment and time,
> shutdown your application, and resume execution at that snapshot later
> in time. This works for Flink but not yet for Beam programs.
>
> ** Scheduling **
>
> You'll have to setup a cron job or an external scheduling program to
> run a Flink job at a specified time. There is no built-in pipeline
> scheduling in Flink.
>
> ** Monitoring **
>
> Flink has a nice web interface available on port 8081 of the job
> manager (master) node. It contains statistics like the number of
> records read/written per operator and JVM metrics.
>
> You may also register "Accumulators" which enable you to provide your
> own metrics. In the Beam API these are called "Aggregators".
> Aggregators get translated to Flink accumulators. For instance, you
> can have an aggregator that counts the number of records written by a
> particular operator.
>
> You can see these metrics on the web interface or access them via the
> Flink Rest API:
>
> https://ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_api.html
>
> Here is an example of an aggregator in Beam which counts the number of
> elements processed:
>
> public class TestAggregator {
>
>   public static void main(String[] args) throws
> AggregatorRetrievalException {
>
>     class MyDoFn extends DoFn<Integer, Integer> {
>
>       Aggregator<Long, Long> agg = createAggregator("numRecords", new
> Sum.SumLongFn());
>
>       @Override
>       public void processElement(ProcessContext c) throws Exception {
>         agg.addValue(1L);
>       }
>     }
>
>     FlinkPipelineOptions options =
> PipelineOptionsFactory.as(FlinkPipelineOptions.class);
>     options.setRunner(FlinkPipelineRunner.class);
>     options.setStreaming(true);
>
>     Pipeline pipeline = Pipeline.create(options);
>
>     MyDoFn myDoFn = new MyDoFn();
>     pipeline.apply(Create.of(1, 2, 3)).apply(ParDo.of(myDoFn));
>
>     PipelineResult result = pipeline.run();
>
>     System.out.println("Result: " +
> result.getAggregatorValues(myDoFn.agg).getValues());
>
>   }
> }
>
> As expected, this prints [3].
>
>
> Cheers,
> Max
>
> On Mon, Apr 25, 2016 at 7:25 PM, kaniska Mandal
> <kaniska.mandal@gmail.com> wrote:
> > Whats the recommended approach
> >
> >> to reliably shut down the pipeline
> >
> >> to run the beam-flink pipeline in a scheduled manner
> >
> >> tomonitor the rates/throughputs/throttling/multiple threads spawned  -
> by
> >> Pipeline , any suggestion ?
> >
> > Thanks
> > Kaniska
>

--001a11406a9cf1c30d05315d2d90
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div>Hi Max,<br><br></div>Many thanks for the gr=
eat explanations.<br><br></div><div><u>Few questions regarding &#39;Executi=
on Strategy of Group of similar / disparate Beam Pipelines&#39; </u><br><br=
>&gt;=C2=A0 Is it a feasible idea to maintain any Load Balancer in front of=
 the &#39;Beam-Flink Pipeline Executor Process&#39; in order to control rat=
e-limits / throttling ?<br></div><div><br>&gt; Should we maintain multiple =
containers of &#39;Beam-Flink Pipelines&#39; or a single container with mul=
tiple instances of Beam-Flink Pipeline ?<br><br></div><div>&gt; Is it a goo=
d idea to use any external &#39;Process Flow Controller&#39; like Apache Ni=
Fi to wire the Beam Pipelines and launch / halt / resume them programmatica=
lly / interactively ?<br><br></div><div><u>More questions related to gracef=
ul shutdown and restart</u><br></div><div><br></div>Currently Flink Pipelin=
e Runner#run() throws Runtime Exception. <br></div><div><br>&gt; So is it g=
ood enough to add a shutdownhook on Beam Pipeline and close resources like =
KafkaProducer when Pipeline Job is killed due to Runtime Exception ?=C2=A0 =
It would have been better if a custom exception was thrown , so that the Jo=
b could gracefully handle it !<br><br></div><div>BTW, I tried calling close=
() on FlinkKafkaProucer form inside Beam-Flink Pipeline Runner , but the pr=
oducer didn&#39;t stop.<br></div><div><br>&gt; I understand we can use moni=
td to restart a process, but any suggestion to implement external &#39;Beam=
 Pipeline Monitoring Agent&#39;=C2=A0 to auto-retry to restart the Pipeline=
 ? <br><br></div>Let me know if any of the above points sound logical, then=
 I&#39;ll go ahead and create Feature request.<br><div><br></div><div>Thank=
s,<br></div><div>Kaniska<br></div></div><div class=3D"gmail_extra"><br><div=
 class=3D"gmail_quote">On Mon, Apr 25, 2016 at 11:12 AM, Maximilian Michels=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:mxm@apache.org" target=3D"_blank">=
mxm@apache.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi K=
aniska,<br>
<br>
Not all of these are uniform across all Runners yet but since you have<br>
previously deployed applications with the Flink Runner, here are my<br>
answers from a Flink perspective.<br>
<br>
** Shutdown **<br>
<br>
For shutting down a Flink pipeline, you can use the &quot;cancel&quot; func=
tion:<br>
<br>
&gt;./bin/flink cancel &lt;jobId&gt;<br>
<br>
When you submit your job in detached mode, e.g., ./bin/flink run -d<br>
/path/to/jar you get a job id in return which you can use for the<br>
cancel command. Alternatively, query the running jobs via<br>
<br>
&gt;./bin/flink list<br>
<br>
Very soon we will have checkpointing of sources/sinks in Beam. That<br>
would enable you to use Flink&#39;s Savepoint feature. Savepoints allow<br>
you to take a snapshot of your Flink job at a moment and time,<br>
shutdown your application, and resume execution at that snapshot later<br>
in time. This works for Flink but not yet for Beam programs.<br>
<br>
** Scheduling **<br>
<br>
You&#39;ll have to setup a cron job or an external scheduling program to<br=
>
run a Flink job at a specified time. There is no built-in pipeline<br>
scheduling in Flink.<br>
<br>
** Monitoring **<br>
<br>
Flink has a nice web interface available on port 8081 of the job<br>
manager (master) node. It contains statistics like the number of<br>
records read/written per operator and JVM metrics.<br>
<br>
You may also register &quot;Accumulators&quot; which enable you to provide =
your<br>
own metrics. In the Beam API these are called &quot;Aggregators&quot;.<br>
Aggregators get translated to Flink accumulators. For instance, you<br>
can have an aggregator that counts the number of records written by a<br>
particular operator.<br>
<br>
You can see these metrics on the web interface or access them via the<br>
Flink Rest API:<br>
<a href=3D"https://ci.apache.org/projects/flink/flink-docs-master/internals=
/monitoring_rest_api.html" rel=3D"noreferrer" target=3D"_blank">https://ci.=
apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_api.h=
tml</a><br>
<br>
Here is an example of an aggregator in Beam which counts the number of<br>
elements processed:<br>
<br>
public class TestAggregator {<br>
<br>
=C2=A0 public static void main(String[] args) throws AggregatorRetrievalExc=
eption {<br>
<br>
=C2=A0 =C2=A0 class MyDoFn extends DoFn&lt;Integer, Integer&gt; {<br>
<br>
=C2=A0 =C2=A0 =C2=A0 Aggregator&lt;Long, Long&gt; agg =3D createAggregator(=
&quot;numRecords&quot;, new<br>
Sum.SumLongFn());<br>
<br>
=C2=A0 =C2=A0 =C2=A0 @Override<br>
=C2=A0 =C2=A0 =C2=A0 public void processElement(ProcessContext c) throws Ex=
ception {<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 agg.addValue(1L);<br>
=C2=A0 =C2=A0 =C2=A0 }<br>
=C2=A0 =C2=A0 }<br>
<br>
=C2=A0 =C2=A0 FlinkPipelineOptions options =3D<br>
PipelineOptionsFactory.as(FlinkPipelineOptions.class);<br>
=C2=A0 =C2=A0 options.setRunner(FlinkPipelineRunner.class);<br>
=C2=A0 =C2=A0 options.setStreaming(true);<br>
<br>
=C2=A0 =C2=A0 Pipeline pipeline =3D Pipeline.create(options);<br>
<br>
=C2=A0 =C2=A0 MyDoFn myDoFn =3D new MyDoFn();<br>
=C2=A0 =C2=A0 pipeline.apply(Create.of(1, 2, 3)).apply(ParDo.of(myDoFn));<b=
r>
<br>
=C2=A0 =C2=A0 PipelineResult result =3D pipeline.run();<br>
<br>
=C2=A0 =C2=A0 System.out.println(&quot;Result: &quot; +<br>
result.getAggregatorValues(myDoFn.agg).getValues());<br>
<br>
=C2=A0 }<br>
}<br>
<br>
As expected, this prints [3].<br>
<br>
<br>
Cheers,<br>
Max<br>
<br>
On Mon, Apr 25, 2016 at 7:25 PM, kaniska Mandal<br>
&lt;<a href=3D"mailto:kaniska.mandal@gmail.com">kaniska.mandal@gmail.com</a=
>&gt; wrote:<br>
&gt; Whats the recommended approach<br>
&gt;<br>
&gt;&gt; to reliably shut down the pipeline<br>
&gt;<br>
&gt;&gt; to run the beam-flink pipeline in a scheduled manner<br>
&gt;<br>
&gt;&gt; tomonitor the rates/throughputs/throttling/multiple threads spawne=
d=C2=A0 - by<br>
&gt;&gt; Pipeline , any suggestion ?<br>
&gt;<br>
&gt; Thanks<br>
&gt; Kaniska<br>
</blockquote></div><br></div>

--001a11406a9cf1c30d05315d2d90--