Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: <F10CBF26-18E1-4F5F-8266-3EDA7227765E@micardo.com>
References: <185A8335-2A16-46B2-AE96-D346502F15DA@micardo.com>
 <CAKDV8W7qLuZ1snZ5d8bGOE=O+MAizrPUJ=QB=PGDet50vPVWTg@mail.gmail.com>
 <CAC27z=Pa4jBA6VZK24fJmjnrJswGtVZiq8eViVbt2Bxi5Lsh_Q@mail.gmail.com> <F10CBF26-18E1-4F5F-8266-3EDA7227765E@micardo.com>
From: Robert Metzger <rmetzger@apache.org>
Date: Thu, 26 Jan 2017 15:01:46 +0100
Message-ID: <CAGr9p8Cn4npUfi9xPt63CNrfBOFS2L2GdAnfADoeF02Qqs2vzw@mail.gmail.com>
Subject: Re: Rate-limit processing
To: "user@flink.apache.org" <user@flink.apache.org>
Content-Type: multipart/alternative; boundary=001a114ac8fe697c920546ffcb21
archived-at: Thu, 26 Jan 2017 14:02:09 -0000

--001a114ac8fe697c920546ffcb21
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Florian,

you can rate-limit the Kafka consumer by implementing a custom
DeserializationSchema that sleeps a bit from time to time (or at each
deserialization step)

On Tue, Jan 24, 2017 at 1:16 PM, Florian K=C3=B6nig <florian.koenig@micardo=
.com>
wrote:

> Hi Till,
>
> thank you for the very helpful hints. You are right, I already see
> backpressure. In my case, that=E2=80=99s ok because it throttles the Kafk=
a source.
> Speaking of which: You mentioned putting the rate limiting mechanism into
> the source. How can I do this with a Kafka source? Just extend the
> Producer, or is there a better mechanism to hook into the connector?
>
> Cheers,
> Florian
>
>
> > Am 20.01.2017 um 16:58 schrieb Till Rohrmann <trohrmann@apache.org>:
> >
> > Hi Florian,
> >
> > any blocking of the user code thread is in general a not so good idea
> because the checkpointing happens under the very same lock which also
> guards the user code invocation. Thus any checkpoint barrier arriving at
> the operator has only the chance to trigger the checkpointing once the
> blocking is over. Even worse, if the blocking happens in a downstream
> operator (not a source), then this blocking could cause backpressure. Sin=
ce
> the checkpoint barriers flow with the events and are processed in order,
> the backpressure will then also influence the checkpointing time.
> >
> > So if you want to limit the rate, you should do it a the sources withou=
t
> blocking the source thread. You could for example count how many elements
> you've emitted in the past second and if it exceeds your maximum, then yo=
u
> don't emit the next element to downstream operators until some time has
> passed (this might end up in a busy loop but it allows the checkpointing =
to
> claim the lock).
> >
> > Cheers,
> > Till
> >
> > On Fri, Jan 20, 2017 at 12:18 PM, Yassine MARZOUGUI <
> y.marzougui@mindlytix.com> wrote:
> > Hi,
> >
> > You might find this similar thread from the mailing list archive helpfu=
l
> : http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/throttled-stream-td6138.html.
> >
> > Best,
> > Yassine
> >
> > 2017-01-20 10:53 GMT+01:00 Florian K=C3=B6nig <florian.koenig@micardo.c=
om>:
> > Hi,
> >
> > i need to limit the rate of processing in a Flink stream application.
> Specifically, the number of items processed in a .map() operation has to
> stay under a certain maximum per second.
> >
> > At the moment, I have another .map() operation before the actual
> processing, which just sleeps for a certain time (e.g., 250ms for a limit
> of 4 requests / sec) and returns the item unchanged:
> >
> > =E2=80=A6
> >
> > public T map(final T value) throws Exception {
> >         Thread.sleep(delay);
> >         return value;
> > }
> >
> > =E2=80=A6
> >
> > This works as expected, but is a rather crude approach. Checkpointing
> the job takes a very long time: minutes for a state of a few kB, which fo=
r
> other jobs is done in a few milliseconds. I assume that letting the whole
> thread sleep for most of the time interferes with the checkpointing - not
> good!
> >
> > Would using a different synchronization mechanism (e.g.,
> https://google.github.io/guava/releases/19.0/api/docs/
> index.html?com/google/common/util/concurrent/RateLimiter.html) help to
> make checkpointing work better?
> >
> > Or, preferably, is there a mechanism inside Flink that I can use to
> accomplish the desired rate limiting? I haven=E2=80=99t found anything in=
 the docs.
> >
> > Cheers,
> > Florian
> >
> >
>
>
>

--001a114ac8fe697c920546ffcb21
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Florian,<div><br></div><div>you can rate-limit the Kafk=
a consumer by implementing a custom DeserializationSchema that sleeps a bit=
 from time to time (or at each deserialization step)</div></div><div class=
=3D"gmail_extra"><br><div class=3D"gmail_quote">On Tue, Jan 24, 2017 at 1:1=
6 PM, Florian K=C3=B6nig <span dir=3D"ltr">&lt;<a href=3D"mailto:florian.ko=
enig@micardo.com" target=3D"_blank">florian.koenig@micardo.com</a>&gt;</spa=
n> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b=
order-left:1px #ccc solid;padding-left:1ex">Hi Till,<br>
<br>
thank you for the very helpful hints. You are right, I already see backpres=
sure. In my case, that=E2=80=99s ok because it throttles the Kafka source. =
Speaking of which: You mentioned putting the rate limiting mechanism into t=
he source. How can I do this with a Kafka source? Just extend the Producer,=
 or is there a better mechanism to hook into the connector?<br>
<br>
Cheers,<br>
Florian<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
<br>
&gt; Am 20.01.2017 um 16:58 schrieb Till Rohrmann &lt;<a href=3D"mailto:tro=
hrmann@apache.org">trohrmann@apache.org</a>&gt;:<br>
&gt;<br>
&gt; Hi Florian,<br>
&gt;<br>
&gt; any blocking of the user code thread is in general a not so good idea =
because the checkpointing happens under the very same lock which also guard=
s the user code invocation. Thus any checkpoint barrier arriving at the ope=
rator has only the chance to trigger the checkpointing once the blocking is=
 over. Even worse, if the blocking happens in a downstream operator (not a =
source), then this blocking could cause backpressure. Since the checkpoint =
barriers flow with the events and are processed in order, the backpressure =
will then also influence the checkpointing time.<br>
&gt;<br>
&gt; So if you want to limit the rate, you should do it a the sources witho=
ut blocking the source thread. You could for example count how many element=
s you&#39;ve emitted in the past second and if it exceeds your maximum, the=
n you don&#39;t emit the next element to downstream operators until some ti=
me has passed (this might end up in a busy loop but it allows the checkpoin=
ting to claim the lock).<br>
&gt;<br>
&gt; Cheers,<br>
&gt; Till<br>
&gt;<br>
&gt; On Fri, Jan 20, 2017 at 12:18 PM, Yassine MARZOUGUI &lt;<a href=3D"mai=
lto:y.marzougui@mindlytix.com">y.marzougui@mindlytix.com</a>&gt; wrote:<br>
&gt; Hi,<br>
&gt;<br>
&gt; You might find this similar thread from the mailing list archive helpf=
ul : <a href=3D"http://apache-flink-user-mailing-list-archive.2336050.n4.na=
bble.com/throttled-stream-td6138.html" rel=3D"noreferrer" target=3D"_blank"=
>http://apache-flink-user-<wbr>mailing-list-archive.2336050.<wbr>n4.nabble.=
com/throttled-<wbr>stream-td6138.html</a>.<br>
&gt;<br>
&gt; Best,<br>
&gt; Yassine<br>
&gt;<br>
&gt; 2017-01-20 10:53 GMT+01:00 Florian K=C3=B6nig &lt;<a href=3D"mailto:fl=
orian.koenig@micardo.com">florian.koenig@micardo.com</a>&gt;:<br>
&gt; Hi,<br>
&gt;<br>
&gt; i need to limit the rate of processing in a Flink stream application. =
Specifically, the number of items processed in a .map() operation has to st=
ay under a certain maximum per second.<br>
&gt;<br>
&gt; At the moment, I have another .map() operation before the actual proce=
ssing, which just sleeps for a certain time (e.g., 250ms for a limit of 4 r=
equests / sec) and returns the item unchanged:<br>
&gt;<br>
&gt; =E2=80=A6<br>
&gt;<br>
&gt; public T map(final T value) throws Exception {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Thread.sleep(delay);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return value;<br>
&gt; }<br>
&gt;<br>
&gt; =E2=80=A6<br>
&gt;<br>
&gt; This works as expected, but is a rather crude approach. Checkpointing =
the job takes a very long time: minutes for a state of a few kB, which for =
other jobs is done in a few milliseconds. I assume that letting the whole t=
hread sleep for most of the time interferes with the checkpointing - not go=
od!<br>
&gt;<br>
&gt; Would using a different synchronization mechanism (e.g., <a href=3D"ht=
tps://google.github.io/guava/releases/19.0/api/docs/index.html?com/google/c=
ommon/util/concurrent/RateLimiter.html" rel=3D"noreferrer" target=3D"_blank=
">https://google.github.io/<wbr>guava/releases/19.0/api/docs/<wbr>index.htm=
l?com/google/common/<wbr>util/concurrent/RateLimiter.<wbr>html</a>) help to=
 make checkpointing work better?<br>
&gt;<br>
&gt; Or, preferably, is there a mechanism inside Flink that I can use to ac=
complish the desired rate limiting? I haven=E2=80=99t found anything in the=
 docs.<br>
&gt;<br>
&gt; Cheers,<br>
&gt; Florian<br>
&gt;<br>
&gt;<br>
<br>
<br>
</div></div></blockquote></div><br></div>

--001a114ac8fe697c920546ffcb21--