Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: <CAMpYU5Rh7XrzygPA17qEAFcewOVWhRQ84xrE-UMhniyUm-tHyA@mail.gmail.com>
References: <CAMpYU5SxQ_2uLkCrFV+XJ6-QPFfbaqR7Ze77MgNpNJhSiJWNkg@mail.gmail.com>
 <CANC1h_sk-CkV5a9O6ZQdnJstAeCCK3cZ6FjOC4GkHCBE6aek+Q@mail.gmail.com>
 <1487753727469-11799.post@n4.nabble.com> <CANC1h_tosN31kwJyVXjucot1NnUbmu7D1gD3uDJm=FmnZXepag@mail.gmail.com>
 <1487859615288-11831.post@n4.nabble.com> <CAMpYU5RLiYm=sP1ko355EE1y4fW-Rx7C-pU+DPL636ewVGvhKA@mail.gmail.com>
 <CANC1h_sDEeULyymtLCfFvFe-hUEZvdV5sbixZD57HMp8N1Rz1w@mail.gmail.com>
 <1487941734848-11879.post@n4.nabble.com> <1487944701219-11882.post@n4.nabble.com>
 <CANC1h_tU-+D5UzxfbtQqNBMh-eU8P80sGnByFCcmEXjvJayGPg@mail.gmail.com> <CAMpYU5Rh7XrzygPA17qEAFcewOVWhRQ84xrE-UMhniyUm-tHyA@mail.gmail.com>
From: Stephan Ewen <sewen@apache.org>
Date: Fri, 24 Feb 2017 17:43:56 +0100
Message-ID: <CANC1h_unYJ-uuNXHmo==AZwSobZ1nfpZWHTJMUfkJZwpfu0eDw@mail.gmail.com>
Subject: Re: Checkpointing with RocksDB as statebackend
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=001a114aa33a9d35650549496f3f
archived-at: Fri, 24 Feb 2017 16:44:04 -0000

--001a114aa33a9d35650549496f3f
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Flink's state backends currently do a good number of "make sure this
exists" operations on the file systems. Through Hadoop's S3 filesystem,
that translates to S3 bucket list operations, where there is a limit in how
many operation may happen per time interval. After that, S3 blocks.

It seems that operations that are totally cheap on HDFS are hellishly
expensive (and limited) on S3. It may be that you are affected by that.

We are gradually trying to improve the behavior there and be more S3 aware.

Both 1.3-SNAPSHOT and 1.2-SNAPSHOT already contain improvements there.

Best,
Stephan


On Fri, Feb 24, 2017 at 4:42 PM, vinay patil <vinay18.patil@gmail.com>
wrote:

> Hi Stephan,
>
> So do you mean that S3 is causing the stall , as I have mentioned in my
> previous mail, I could not see any progress for 16minutes as checkpoints
> were getting failed continuously.
>
> On Feb 24, 2017 8:30 PM, "Stephan Ewen [via Apache Flink User Mailing Lis=
t
> archive.]" <[hidden email]
> <http:///user/SendEmail.jtp?type=3Dnode&node=3D11887&i=3D0>> wrote:
>
>> Hi Vinay!
>>
>> True, the operator state (like Kafka) is currently not asynchronously
>> checkpointed.
>>
>> While it is rather small state, we have seen before that on S3 it can
>> cause trouble, because S3 frequently stalls uploads of even data amounts=
 as
>> low as kilobytes due to its throttling policies.
>>
>> That would be a super important fix to add!
>>
>> Best,
>> Stephan
>>
>>
>> On Fri, Feb 24, 2017 at 2:58 PM, vinay patil <[hidden email]
>> <http:///user/SendEmail.jtp?type=3Dnode&node=3D11885&i=3D0>> wrote:
>>
>>> Hi,
>>>
>>> I have attached a snapshot for reference:
>>> As you can see all the 3 checkpointins failed , for checkpoint ID 2 and
>>> 3 it
>>> is stuck at the Kafka source after 50%
>>> (The data sent till now by Kafka source 1 is 65GB and sent by source 2 =
is
>>> 15GB )
>>>
>>> Within 10minutes 15M records were processed, and for the next 16minutes
>>> the
>>> pipeline is stuck , I don't see any progress beyond 15M because of
>>> checkpoints getting failed consistently.
>>>
>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.na
>>> bble.com/file/n11882/Checkpointing_Failed.png>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-flink-user-maili
>>> ng-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-
>>> RocksDB-as-statebackend-tp11752p11882.html
>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>> archive at Nabble.com.
>>>
>>
>>
>>
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-
>> tp11752p11885.html
>> To start a new topic under Apache Flink User Mailing List archive., emai=
l [hidden
>> email] <http:///user/SendEmail.jtp?type=3Dnode&node=3D11887&i=3D1>
>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>> NAML
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/tem=
plate/NamlServlet.jtp?macro=3Dmacro_viewer&id=3Dinstant_html%21nabble%3Aema=
il.naml&base=3Dnabble.naml.namespaces.BasicNamespace-nabble.view.web.templa=
te.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=3Dnot=
ify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-=
send_instant_email%21nabble%3Aemail.naml>
>>
>
> ------------------------------
> View this message in context: Re: Checkpointing with RocksDB as
> statebackend
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-C=
heckpointing-with-RocksDB-as-statebackend-tp11752p11887.html>
> Sent from the Apache Flink User Mailing List archive. mailing list archiv=
e
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at
> Nabble.com.
>

--001a114aa33a9d35650549496f3f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Flink&#39;s state backends currently do a good number of &=
quot;make sure this exists&quot; operations on the file systems. Through Ha=
doop&#39;s S3 filesystem, that translates to S3 bucket list operations, whe=
re there is a limit in how many operation may happen per time interval. Aft=
er that, S3 blocks.<div><br></div><div>It seems that operations that are to=
tally cheap on HDFS are hellishly expensive (and limited) on S3. It may be =
that you are affected by that.</div><div><br></div><div>We are gradually tr=
ying to improve the behavior there and be more S3 aware.</div><div><br></di=
v><div>Both 1.3-SNAPSHOT and 1.2-SNAPSHOT already contain improvements ther=
e.</div><div><br></div><div>Best,</div><div>Stephan</div><div><br><div><div=
 class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, Feb 24, 2017 =
at 4:42 PM, vinay patil <span dir=3D"ltr">&lt;<a href=3D"mailto:vinay18.pat=
il@gmail.com" target=3D"_blank">vinay18.patil@gmail.com</a>&gt;</span> wrot=
e:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex"><p dir=3D"ltr">Hi Stephan,</p>
<p dir=3D"ltr">So do you mean that S3 is causing the stall , as I have ment=
ioned in my previous mail, I could not see any progress for 16minutes as ch=
eckpoints were getting failed continuously.</p>
<div class=3D"gmail_extra"><br><div class=3D"gmail_quote"><span class=3D"">=
On Feb 24, 2017 8:30 PM, &quot;Stephan Ewen [via Apache Flink User Mailing =
List archive.]&quot; &lt;<a href=3D"http:///user/SendEmail.jtp?type=3Dnode&=
amp;node=3D11887&amp;i=3D0" rel=3D"nofollow" link=3D"external" target=3D"_b=
lank">[hidden email]</a>&gt; wrote:<br type=3D"attribution"></span><blockqu=
ote style=3D"border-left:2px solid #cccccc;padding:0 1em" class=3D"gmail_qu=
ote">

	<div dir=3D"ltr"><span class=3D"">Hi Vinay!<div><br></div><div>True, the o=
perator state (like Kafka) is currently not asynchronously checkpointed.</d=
iv><div><br></div><div>While it is rather small state, we have seen before =
that on S3 it can cause trouble, because S3 frequently stalls uploads of ev=
en data amounts as low as kilobytes due to its throttling policies.</div><d=
iv><br></div><div>That would be a super important fix to add!</div><div><br=
></div><div>Best,</div><div>Stephan</div><div><br></div></span><span class=
=3D""><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, Feb=
 24, 2017 at 2:58 PM, vinay patil <span dir=3D"ltr">&lt;<a href=3D"http:///=
user/SendEmail.jtp?type=3Dnode&amp;node=3D11885&amp;i=3D0" rel=3D"nofollow"=
 link=3D"external" target=3D"_blank">[hidden email]</a>&gt;</span> wrote:<b=
r><blockquote style=3D"border-left:2px solid #cccccc;padding:0 1em" class=
=3D"gmail_quote">Hi,<br>
<br>
I have attached a snapshot for reference:<br>
As you can see all the 3 checkpointins failed , for checkpoint ID 2 and 3 i=
t<br>
is stuck at the Kafka source after 50%<br>
(The data sent till now by Kafka source 1 is 65GB and sent by source 2 is<b=
r>
15GB )<br>
<br>
Within 10minutes 15M records were processed, and for the next 16minutes the=
<br>
pipeline is stuck , I don&#39;t see any progress beyond 15M because of<br>
checkpoints getting failed consistently.<br>
<br>
&lt;<a href=3D"http://apache-flink-user-mailing-list-archive.2336050.n4.nab=
ble.com/file/n11882/Checkpointing_Failed.png" rel=3D"nofollow" link=3D"exte=
rnal" target=3D"_blank">http://apache-flink-user-mail<wbr>ing-list-archive.=
2336050.n4.na<wbr>bble.com/file/n11882/Checkpoin<wbr>ting_Failed.png</a>&gt=
;<br>
<br>
<br>
<br>
--<br>
View this message in context: <a href=3D"http://apache-flink-user-mailing-l=
ist-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebac=
kend-tp11752p11882.html" rel=3D"nofollow" link=3D"external" target=3D"_blan=
k">http://apache-flink-user-maili<wbr>ng-list-archive.2336050.n4.nab<wbr>bl=
e.com/Re-Checkpointing-with-<wbr>RocksDB-as-statebackend-tp1175<wbr>2p11882=
.html</a><br>
<div class=3D"m_1491457194098921277m_768507842402978629HOEnZb"><div class=
=3D"m_1491457194098921277m_768507842402978629h5">Sent from the Apache Flink=
 User Mailing List archive. mailing list archive at Nabble.com.<br>
</div></div></blockquote></div><br></div></span></div>


=09
=09
=09
	<br>
	<br>
	<hr noshade size=3D"1" color=3D"#cccccc">
	<div style=3D"color:#444;font:12px tahoma,geneva,helvetica,arial,sans-seri=
f"><span class=3D"">
		<div style=3D"font-weight:bold">If you reply to this email, your message =
will be added to the discussion below:</div>
		</span><a href=3D"http://apache-flink-user-mailing-list-archive.2336050.n=
4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p11885.ht=
ml" rel=3D"nofollow" link=3D"external" target=3D"_blank">http://apache-flin=
k-user-maili<wbr>ng-list-archive.2336050.n4.<wbr>nabble.com/Re-Checkpointin=
g-<wbr>with-RocksDB-as-statebackend-<wbr>tp11752p11885.html</a>
	</div><span class=3D"">
	<div style=3D"color:#666;font:11px tahoma,geneva,helvetica,arial,sans-seri=
f;margin-top:.4em;line-height:1.5em">
		To start a new topic under Apache Flink User Mailing List archive., email=
 <a href=3D"http:///user/SendEmail.jtp?type=3Dnode&amp;node=3D11887&amp;i=
=3D1" rel=3D"nofollow" link=3D"external" target=3D"_blank">[hidden email]</=
a> <br>
		To unsubscribe from Apache Flink User Mailing List archive., <a rel=3D"no=
follow" link=3D"external">click here</a>.<br>
		<a href=3D"http://apache-flink-user-mailing-list-archive.2336050.n4.nabbl=
e.com/template/NamlServlet.jtp?macro=3Dmacro_viewer&amp;id=3Dinstant_html%2=
1nabble%3Aemail.naml&amp;base=3Dnabble.naml.namespaces.BasicNamespace-nabbl=
e.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&=
amp;breadcrumbs=3Dnotify_subscribers%21nabble%3Aemail.naml-instant_emails%2=
1nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml" rel=3D"nofol=
low" style=3D"font:9px serif" link=3D"external" target=3D"_blank">NAML</a>
	</div></span></blockquote></div></div><span class=3D"im HOEnZb">


=09
=09
=09
<br><hr align=3D"left" width=3D"300">
View this message in context: <a href=3D"http://apache-flink-user-mailing-l=
ist-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebac=
kend-tp11752p11887.html" target=3D"_blank">Re: Checkpointing with RocksDB a=
s statebackend</a><br></span><div class=3D"HOEnZb"><div class=3D"h5">
Sent from the <a href=3D"http://apache-flink-user-mailing-list-archive.2336=
050.n4.nabble.com/" target=3D"_blank">Apache Flink User Mailing List archiv=
e. mailing list archive</a> at Nabble.com.<br></div></div></blockquote></di=
v><br></div></div></div></div>

--001a114aa33a9d35650549496f3f--