Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
Date: Fri, 24 Feb 2017 10:47:28 -0800 (PST)
From: vinay patil <vinay18.patil@gmail.com>
To: user@flink.apache.org
Message-ID: <CAMpYU5TD9TeeTVEt_z9FdkowAXpv8du1z5CZxXJT1pbLivPHtA@mail.gmail.com>
In-Reply-To: <CANC1h_unYJ-uuNXHmo==AZwSobZ1nfpZWHTJMUfkJZwpfu0eDw@mail.gmail.com>
References: <1487753727469-11799.post@n4.nabble.com> <CANC1h_tosN31kwJyVXjucot1NnUbmu7D1gD3uDJm=FmnZXepag@mail.gmail.com> <1487859615288-11831.post@n4.nabble.com> <CAMpYU5RLiYm=sP1ko355EE1y4fW-Rx7C-pU+DPL636ewVGvhKA@mail.gmail.com> <CANC1h_sDEeULyymtLCfFvFe-hUEZvdV5sbixZD57HMp8N1Rz1w@mail.gmail.com> <1487941734848-11879.post@n4.nabble.com> <1487944701219-11882.post@n4.nabble.com> <CANC1h_tU-+D5UzxfbtQqNBMh-eU8P80sGnByFCcmEXjvJayGPg@mail.gmail.com> <CAMpYU5Rh7XrzygPA17qEAFcewOVWhRQ84xrE-UMhniyUm-tHyA@mail.gmail.com> <CANC1h_unYJ-uuNXHmo==AZwSobZ1nfpZWHTJMUfkJZwpfu0eDw@mail.gmail.com>
Subject: Re: Checkpointing with RocksDB as statebackend
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_137934_899625068.1487962048469"
archived-at: Fri, 24 Feb 2017 18:52:31 -0000

------=_Part_137934_899625068.1487962048469
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi Stephan,

To verify if S3 is making teh pipeline stall, I have replaced the S3 sink
with HDFS and kept minimum pause between checkpoints to 5minutes, still I
see the same issue with checkpoints getting failed.

If I keep the  pause time to 20 seconds, all checkpoints are completed ,
however there is a hit in overall throughput.


Regards,
Vinay Patil

On Fri, Feb 24, 2017 at 10:09 PM, Stephan Ewen [via Apache Flink User
Mailing List archive.] <ml-node+s2336050n11891h73@n4.nabble.com> wrote:

> Flink's state backends currently do a good number of "make sure this
> exists" operations on the file systems. Through Hadoop's S3 filesystem,
> that translates to S3 bucket list operations, where there is a limit in how
> many operation may happen per time interval. After that, S3 blocks.
>
> It seems that operations that are totally cheap on HDFS are hellishly
> expensive (and limited) on S3. It may be that you are affected by that.
>
> We are gradually trying to improve the behavior there and be more S3 aware.
>
> Both 1.3-SNAPSHOT and 1.2-SNAPSHOT already contain improvements there.
>
> Best,
> Stephan
>
>
> On Fri, Feb 24, 2017 at 4:42 PM, vinay patil <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=11891&i=0>> wrote:
>
>> Hi Stephan,
>>
>> So do you mean that S3 is causing the stall , as I have mentioned in my
>> previous mail, I could not see any progress for 16minutes as checkpoints
>> were getting failed continuously.
>>
>> On Feb 24, 2017 8:30 PM, "Stephan Ewen [via Apache Flink User Mailing
>> List archive.]" <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=11887&i=0>> wrote:
>>
>>> Hi Vinay!
>>>
>>> True, the operator state (like Kafka) is currently not asynchronously
>>> checkpointed.
>>>
>>> While it is rather small state, we have seen before that on S3 it can
>>> cause trouble, because S3 frequently stalls uploads of even data amounts as
>>> low as kilobytes due to its throttling policies.
>>>
>>> That would be a super important fix to add!
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> On Fri, Feb 24, 2017 at 2:58 PM, vinay patil <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=11885&i=0>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have attached a snapshot for reference:
>>>> As you can see all the 3 checkpointins failed , for checkpoint ID 2 and
>>>> 3 it
>>>> is stuck at the Kafka source after 50%
>>>> (The data sent till now by Kafka source 1 is 65GB and sent by source 2
>>>> is
>>>> 15GB )
>>>>
>>>> Within 10minutes 15M records were processed, and for the next 16minutes
>>>> the
>>>> pipeline is stuck , I don't see any progress beyond 15M because of
>>>> checkpoints getting failed consistently.
>>>>
>>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.na
>>>> bble.com/file/n11882/Checkpointing_Failed.png>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-flink-user-maili
>>>> ng-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-
>>>> RocksDB-as-statebackend-tp11752p11882.html
>>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>>> archive at Nabble.com.
>>>>
>>>
>>>
>>>
>>> ------------------------------
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>>> ble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p11885.html
>>> To start a new topic under Apache Flink User Mailing List archive.,
>>> email [hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=11887&i=1>
>>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>>> NAML
>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>> ------------------------------
>> View this message in context: Re: Checkpointing with RocksDB as
>> statebackend
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p11887.html>
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/>
>> at Nabble.com.
>>
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-
> Checkpointing-with-RocksDB-as-statebackend-tp11752p11891.html
> To start a new topic under Apache Flink User Mailing List archive., email
> ml-node+s2336050n1h83@n4.nabble.com
> To unsubscribe from Apache Flink User Mailing List archive., click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p11901.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
------=_Part_137934_899625068.1487962048469
Content-Type: text/html; charset=UTF8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div>Hi Stephan,<br><br></div>To verify if S3 is=
 making teh pipeline stall, I have replaced the S3 sink with HDFS and kept =
minimum pause between checkpoints to 5minutes, still I see the same issue w=
ith checkpoints getting failed.<br><br></div>If I keep the=C2=A0 pause time=
 to 20 seconds, all checkpoints are completed , however there is a hit in o=
verall throughput.<br></div><br><br><br></div><div class=3D"gmail_extra"><b=
r clear=3D"all"><div><div class=3D"gmail_signature" data-smartmail=3D"gmail=
_signature"><div dir=3D"ltr"><div><div dir=3D"ltr"><font color=3D"#000000">=
Regards,</font><div><font color=3D"#000000">Vinay Patil</font></div></div><=
/div></div></div></div>
<br><div class=3D"gmail_quote">On Fri, Feb 24, 2017 at 10:09 PM, Stephan Ew=
en [via Apache Flink User Mailing List archive.] <span dir=3D"ltr">&lt;<a h=
ref=3D"/user/SendEmail.jtp?type=3Dnode&node=3D11901&i=3D0" target=3D"_top" =
rel=3D"nofollow" link=3D"external">[hidden email]</a>&gt;</span> wrote:<br>=
<blockquote style=3D'border-left:2px solid #CCCCCC;padding:0 1em' class=3D"=
gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-=
left:1ex">

=09<div dir=3D"ltr">Flink&#39;s state backends currently do a good number o=
f &quot;make sure this exists&quot; operations on the file systems. Through=
 Hadoop&#39;s S3 filesystem, that translates to S3 bucket list operations, =
where there is a limit in how many operation may happen per time interval. =
After that, S3 blocks.<div><br></div><div>It seems that operations that are=
 totally cheap on HDFS are hellishly expensive (and limited) on S3. It may =
be that you are affected by that.</div><div><br></div><div>We are gradually=
 trying to improve the behavior there and be more S3 aware.</div><div><br><=
/div><div>Both 1.3-SNAPSHOT and 1.2-SNAPSHOT already contain improvements t=
here.</div><div><br></div><div>Best,</div><div>Stephan</div><div><br><div><=
div class=3D"gmail_extra"><br><div class=3D"gmail_quote"><span class=3D"">O=
n Fri, Feb 24, 2017 at 4:42 PM, vinay patil <span dir=3D"ltr">&lt;<a href=
=3D"http:///user/SendEmail.jtp?type=3Dnode&amp;node=3D11891&amp;i=3D0" rel=
=3D"nofollow" link=3D"external" target=3D"_blank">[hidden email]</a>&gt;</s=
pan> wrote:<br></span><blockquote style=3D'border-left:2px solid #CCCCCC;pa=
dding:0 1em' style=3D"border-left:2px solid #cccccc;padding:0 1em" class=3D=
"gmail_quote"><span class=3D""><p dir=3D"ltr">Hi Stephan,</p>
<p dir=3D"ltr">So do you mean that S3 is causing the stall , as I have ment=
ioned in my previous mail, I could not see any progress for 16minutes as ch=
eckpoints were getting failed continuously.</p>
</span><div class=3D"gmail_extra"><br><div class=3D"gmail_quote"><span clas=
s=3D""><span>On Feb 24, 2017 8:30 PM, &quot;Stephan Ewen [via Apache Flink =
User Mailing List archive.]&quot; &lt;<a href=3D"http:///user/SendEmail.jtp=
?type=3Dnode&amp;node=3D11887&amp;i=3D0" rel=3D"nofollow" link=3D"external"=
 target=3D"_blank">[hidden email]</a>&gt; wrote:<br type=3D"attribution"></=
span></span><blockquote style=3D'border-left:2px solid #CCCCCC;padding:0 1e=
m' style=3D"border-left:2px solid #cccccc;padding:0 1em" class=3D"gmail_quo=
te"><span class=3D"">

=09<div dir=3D"ltr"><span>Hi Vinay!<div><br></div><div>True, the operator s=
tate (like Kafka) is currently not asynchronously checkpointed.</div><div><=
br></div><div>While it is rather small state, we have seen before that on S=
3 it can cause trouble, because S3 frequently stalls uploads of even data a=
mounts as low as kilobytes due to its throttling policies.</div><div><br></=
div><div>That would be a super important fix to add!</div><div><br></div><d=
iv>Best,</div><div>Stephan</div><div><br></div></span><span><div class=3D"g=
mail_extra"><br><div class=3D"gmail_quote">On Fri, Feb 24, 2017 at 2:58 PM,=
 vinay patil <span dir=3D"ltr">&lt;<a href=3D"http:///user/SendEmail.jtp?ty=
pe=3Dnode&amp;node=3D11885&amp;i=3D0" rel=3D"nofollow" link=3D"external" ta=
rget=3D"_blank">[hidden email]</a>&gt;</span> wrote:<br><blockquote style=
=3D'border-left:2px solid #CCCCCC;padding:0 1em' style=3D"border-left:2px s=
olid #cccccc;padding:0 1em" class=3D"gmail_quote">Hi,<br>
<br>
I have attached a snapshot for reference:<br>
As you can see all the 3 checkpointins failed , for checkpoint ID 2 and 3 i=
t<br>
is stuck at the Kafka source after 50%<br>
(The data sent till now by Kafka source 1 is 65GB and sent by source 2 is<b=
r>
15GB )<br>
<br>
Within 10minutes 15M records were processed, and for the next 16minutes the=
<br>
pipeline is stuck , I don&#39;t see any progress beyond 15M because of<br>
checkpoints getting failed consistently.<br>
<br>
&lt;<a href=3D"http://apache-flink-user-mailing-list-archive.2336050.n4.nab=
ble.com/file/n11882/Checkpointing_Failed.png" rel=3D"nofollow" link=3D"exte=
rnal" target=3D"_blank">http://apache-flink-user-mail<wbr>ing-list-archive.=
2336050.n4.na<wbr>bble.com/file/n11882/Checkpoin<wbr>ting_Failed.png</a>&gt=
;<br>
<br>
<br>
<br>
--<br>
View this message in context: <a href=3D"http://apache-flink-user-mailing-l=
ist-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebac=
kend-tp11752p11882.html" rel=3D"nofollow" link=3D"external" target=3D"_blan=
k">http://apache-flink-user-maili<wbr>ng-list-archive.2336050.n4.nab<wbr>bl=
e.com/Re-Checkpointing-with-<wbr>RocksDB-as-statebackend-tp1175<wbr>2p11882=
.html</a><br>
<div class=3D"m_-7943621781236374332m_1491457194098921277m_7685078424029786=
29HOEnZb"><div class=3D"m_-7943621781236374332m_1491457194098921277m_768507=
842402978629h5">Sent from the Apache Flink User Mailing List archive. maili=
ng list archive at Nabble.com.<br>
</div></div></blockquote></div><br></div></span></div>


=09
=09
=09
=09<br>
=09<br>
=09<hr size=3D"1" noshade color=3D"#cccccc">
=09<div style=3D"color:#444;font:12px tahoma,geneva,helvetica,arial,sans-se=
rif"><span>
=09=09<div style=3D"font-weight:bold">If you reply to this email, your mess=
age will be added to the discussion below:</div>
=09=09</span><a href=3D"http://apache-flink-user-mailing-list-archive.23360=
50.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p1188=
5.html" rel=3D"nofollow" link=3D"external" target=3D"_blank">http://apache-=
flink-user-maili<wbr>ng-list-archive.2336050.n4.nab<wbr>ble.com/Re-Checkpoi=
nting-with-<wbr>RocksDB-as-statebackend-tp1175<wbr>2p11885.html</a>
=09</div></span><span>
=09<div style=3D"color:#666;font:11px tahoma,geneva,helvetica,arial,sans-se=
rif;margin-top:.4em;line-height:1.5em">
=09=09To start a new topic under Apache Flink User Mailing List archive., e=
mail <a href=3D"http:///user/SendEmail.jtp?type=3Dnode&amp;node=3D11887&amp=
;i=3D1" rel=3D"nofollow" link=3D"external" target=3D"_blank">[hidden email]=
</a> <br><span class=3D"">
=09=09To unsubscribe from Apache Flink User Mailing List archive., <a rel=
=3D"nofollow" link=3D"external" target=3D"_top">click here</a>.<br>
=09=09<a href=3D"http://apache-flink-user-mailing-list-archive.2336050.n4.n=
abble.com/template/NamlServlet.jtp?macro=3Dmacro_viewer&amp;id=3Dinstant_ht=
ml%21nabble%3Aemail.naml&amp;base=3Dnabble.naml.namespaces.BasicNamespace-n=
abble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamesp=
ace&amp;breadcrumbs=3Dnotify_subscribers%21nabble%3Aemail.naml-instant_emai=
ls%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml" rel=3D"n=
ofollow" style=3D"font:9px serif" link=3D"external" target=3D"_blank">NAML<=
/a>
=09</span></div></span></blockquote></div></div><span class=3D"m_-794362178=
1236374332im m_-7943621781236374332HOEnZb">


=09
=09
=09
<br><span class=3D""><hr width=3D"300" align=3D"left">
View this message in context: <a href=3D"http://apache-flink-user-mailing-l=
ist-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebac=
kend-tp11752p11887.html" rel=3D"nofollow" link=3D"external" target=3D"_blan=
k">Re: Checkpointing with RocksDB as statebackend</a><br></span></span><spa=
n class=3D""><div class=3D"m_-7943621781236374332HOEnZb"><div class=3D"m_-7=
943621781236374332h5">
Sent from the <a href=3D"http://apache-flink-user-mailing-list-archive.2336=
050.n4.nabble.com/" rel=3D"nofollow" link=3D"external" target=3D"_blank">Ap=
ache Flink User Mailing List archive. mailing list archive</a> at Nabble.co=
m.<br></div></div></span></blockquote></div><br></div></div></div></div>


=09
=09
=09
=09<br>
=09<br>
=09<hr size=3D"1" noshade color=3D"#cccccc">
=09<div style=3D"color:#444;font:12px tahoma,geneva,helvetica,arial,sans-se=
rif"><span class=3D"">
=09=09<div style=3D"font-weight:bold">If you reply to this email, your mess=
age will be added to the discussion below:</div>
=09=09</span><a href=3D"http://apache-flink-user-mailing-list-archive.23360=
50.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p1189=
1.html" target=3D"_blank" rel=3D"nofollow" link=3D"external">http://apache-=
flink-user-<wbr>mailing-list-archive.2336050.<wbr>n4.nabble.com/Re-<wbr>Che=
ckpointing-with-RocksDB-as-<wbr>statebackend-tp11752p11891.<wbr>html</a>
=09</div><div class=3D"HOEnZb"><div class=3D"h5">
=09<div style=3D"color:#666;font:11px tahoma,geneva,helvetica,arial,sans-se=
rif;margin-top:.4em;line-height:1.5em">
=09=09To start a new topic under Apache Flink User Mailing List archive., e=
mail <a href=3D"/user/SendEmail.jtp?type=3Dnode&node=3D11901&i=3D1" target=
=3D"_top" rel=3D"nofollow" link=3D"external">[hidden email]</a> <br>
=09=09To unsubscribe from Apache Flink User Mailing List archive., <a href=
=3D"" target=3D"_blank" rel=3D"nofollow" link=3D"external">click here</a>.<=
br>
=09=09<a href=3D"http://apache-flink-user-mailing-list-archive.2336050.n4.n=
abble.com/template/NamlServlet.jtp?macro=3Dmacro_viewer&amp;id=3Dinstant_ht=
ml%21nabble%3Aemail.naml&amp;base=3Dnabble.naml.namespaces.BasicNamespace-n=
abble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamesp=
ace&amp;breadcrumbs=3Dnotify_subscribers%21nabble%3Aemail.naml-instant_emai=
ls%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml" rel=3D"n=
ofollow" style=3D"font:9px serif" target=3D"_blank" link=3D"external">NAML<=
/a>
=09</div></div></div></blockquote></div><br></div>


=09
=09
=09
<br/><hr align=3D"left" width=3D"300" />
View this message in context: <a href=3D"http://apache-flink-user-mailing-l=
ist-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebac=
kend-tp11752p11901.html">Re: Checkpointing with RocksDB as statebackend</a>=
<br/>
Sent from the <a href=3D"http://apache-flink-user-mailing-list-archive.2336=
050.n4.nabble.com/">Apache Flink User Mailing List archive. mailing list ar=
chive</a> at Nabble.com.<br/>
------=_Part_137934_899625068.1487962048469--