Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
From: Stefan Richter <s.richter@data-artisans.com>
Message-Id: <C3C165F3-C9E3-40D8-89B5-B99809EE4EB2@data-artisans.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_0495BEC6-CEEA-4000-843D-2EA496F7FA4C"
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: Tuning RocksDB
Date: Wed, 3 May 2017 18:05:11 +0200
In-Reply-To: <08BD9A55-F373-4741-AEB8-32A7B55A74A5@data-artisans.com>
Cc: user@flink.apache.org
To: Jason Brelloch <jb.bc.flk@gmail.com>
References: <CAKY1MWp4vqR8DGsW9hb3dbg+-eEvkHYDCrv92iDWz+vtpedmOw@mail.gmail.com>
 <08BD9A55-F373-4741-AEB8-32A7B55A74A5@data-artisans.com>
archived-at: Wed, 03 May 2017 16:05:24 -0000


--Apple-Mail=_0495BEC6-CEEA-4000-843D-2EA496F7FA4C
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Sorry, just saw that your question was actually mainly about =
checkpointing, but it can still be related to my previous answer. I =
assume the checkpointing time is the time that is reported in the web =
interface? This would be the end-to-end runtime of the checkpoint which =
does not really tell you how much time is spend on writing the state =
itself, but you can find this exact detail in the logging; you can grep =
for lines that start with "Asynchronous RocksDB snapshot=E2=80=9C. The =
background is that end-to-end also includes the time the checkpoint =
barrier needs to travel to the operator. If there is a lot of =
backpressure and a lot of network buffers, this can take a while. Still, =
the reason for the backpressure could still be in the way you access =
RocksDB, as it seems you are de/serializing every time you update an =
ever-growing value under a single key. I can see that accesses under =
this conditions could become very slow eventually, but could remain fast =
on the FSBackend for the reason from my first answer.

> Am 03.05.2017 um 17:54 schrieb Stefan Richter =
<s.richter@data-artisans.com>:
>=20
> Hi,
>=20
> typically, I would expect that the bottleneck with the RocksDB backend =
is not RocksDB itself, but your TypeSerializers. I suggest to first run =
a profiler/sampling attached to the process and check if the problematic =
methods are in serialization or the actual accesses to RocksDB. The =
RocksDB backend has to go through de/serialize roundtrips on every =
single state access, while the FSBackend works on heap objects =
immediately. For checkpoints, the RocksDB backend can write bytes =
directly whereas the FSBackend has to use the serializers to get from =
objects to bytes, so their actions w.r.t. how serializers are used are =
kind of inverted between operation and checkpointing. For Flink 1.3 we =
also will introduce incremental checkpoints on RocksDB that piggyback on =
the SST files. Flink 1.2 is writing checkpoints and savepoints fully and =
in a custom format.
>=20
> Best,
> Stefan
>=20
>> Am 03.05.2017 um 16:46 schrieb Jason Brelloch <jb.bc.flk@gmail.com =
<mailto:jb.bc.flk@gmail.com>>:
>>=20
>> Hey all,
>>=20
>> I am looking for some advice on tuning rocksDB for better performance =
in Flink 1.2.  I created a pretty simple job with a single kafka source =
and one flatmap function that just stores 50000 events in a single key =
of managed keyed state and then drops everything else, to test =
checkpoint performance.  Using a basic FsStateBackend configured as:
>>=20
>> val backend =3D new =
FsStateBackend("file:///home/jason/flink/checkpoint =
<file:///home/jason/flink/checkpoint>")
>> env.setStateBackend(backend)
>>=20
>> With about 30MB of state we see the checkpoints completing in 151ms.  =
Using a RocksDBStateBackend configured as:
>>=20
>> val backend =3D new =
RocksDBStateBackend("file:///home/jason/flink/checkpoint =
<file:///home/jason/flink/checkpoint>")
>> backend.setDbStoragePath("file:///home/jason/flink/rocksdb =
<file:///home/jason/flink/rocksdb>")
>> backend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED)
>> env.setStateBackend(backend)
>>=20
>> Running the same test the checkpoint takes 3 minutes 42 seconds.
>>=20
>> I expect it to be slower, but that seems excessive.  I am also a =
little confused as to when rocksDB and flink decide to write to disk, =
because watching the database the .sst file wasn't created until =
significantly after the checkpoint was completed, and the state had not =
changed.  Is there anything I can do to increase the speed of the =
checkpoints, or anywhere I can look to debug the issue?  (Nothing seems =
out of the ordinary in the flink logs or rocksDB logs)
>>=20
>> Thanks!
>>=20
>> --=20
>> Jason Brelloch | Product Developer
>> 3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305=20
>>  <http://www.bettercloud.com/>
>> Subscribe to the BetterCloud Monitor =
<https://www.bettercloud.com/monitor?utm_source=3Dbettercloud_email&utm_me=
dium=3Demail_signature&utm_campaign=3Dmonitor_launch> - Get IT delivered =
to your inbox
>=20


--Apple-Mail=_0495BEC6-CEEA-4000-843D-2EA496F7FA4C
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D""><div class=3D"">Sorry, just saw that your question was =
actually mainly about checkpointing, but it can still be related to my =
previous answer. I assume the checkpointing time is the time that is =
reported in the web interface? This would be the end-to-end runtime of =
the checkpoint which does not really tell you how much time is spend on =
writing the state itself, but you can find this exact detail in the =
logging; you can grep for lines that start with "Asynchronous RocksDB =
snapshot=E2=80=9C. The background is that end-to-end also includes the =
time the checkpoint barrier needs to travel to the operator. If there is =
a lot of backpressure and a lot of network buffers, this can take a =
while. Still, the reason for the backpressure could still be in the way =
you access RocksDB, as it seems you are de/serializing every time you =
update an ever-growing value under a single key. I can see that accesses =
under this conditions could become very slow eventually, but could =
remain fast on the FSBackend for the reason from my first =
answer.</div><br class=3D""><div><blockquote type=3D"cite" class=3D""><div=
 class=3D"">Am 03.05.2017 um 17:54 schrieb Stefan Richter &lt;<a =
href=3D"mailto:s.richter@data-artisans.com" =
class=3D"">s.richter@data-artisans.com</a>&gt;:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><meta =
http-equiv=3D"Content-Type" content=3D"text/html charset=3Dus-ascii" =
class=3D""><div style=3D"word-wrap: break-word; -webkit-nbsp-mode: =
space; -webkit-line-break: after-white-space;" class=3D"">Hi,<div =
class=3D""><br class=3D""></div><div class=3D"">typically, I would =
expect that the bottleneck with the RocksDB backend is not RocksDB =
itself, but your TypeSerializers. I suggest to first run a =
profiler/sampling attached to the process and check if the problematic =
methods are in serialization or the actual accesses to RocksDB. The =
RocksDB backend has to go through de/serialize roundtrips on every =
single state access, while the FSBackend works on heap objects =
immediately. For checkpoints, the RocksDB backend can write bytes =
directly whereas the FSBackend has to use the serializers to get from =
objects to bytes, so their actions w.r.t. how serializers are used are =
kind of inverted between operation and checkpointing. For Flink 1.3 we =
also will introduce incremental checkpoints on RocksDB that piggyback on =
the SST files. Flink 1.2 is writing checkpoints and savepoints fully and =
in a custom format.</div><div class=3D""><br class=3D""></div><div =
class=3D"">Best,</div><div class=3D"">Stefan</div><div class=3D""><br =
class=3D""><div class=3D""><blockquote type=3D"cite" class=3D""><div =
class=3D"">Am 03.05.2017 um 16:46 schrieb Jason Brelloch &lt;<a =
href=3D"mailto:jb.bc.flk@gmail.com" =
class=3D"">jb.bc.flk@gmail.com</a>&gt;:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" =
class=3D"">Hey all,<div class=3D""><br class=3D""></div><div class=3D"">I =
am looking for some advice on tuning rocksDB for better performance in =
Flink 1.2.&nbsp; I created a pretty simple job with a single kafka =
source and one flatmap function that just stores 50000 events in a =
single key of managed keyed state and then drops everything else, to =
test checkpoint performance.&nbsp; Using a basic FsStateBackend =
configured as:</div><div class=3D""><br class=3D""></div><div =
class=3D""><div class=3D"">val backend =3D new FsStateBackend("<a =
href=3D"file:///home/jason/flink/checkpoint" =
class=3D"">file:///home/jason/flink/checkpoint</a>")</div><div =
class=3D"">env.setStateBackend(backend)<br class=3D""></div><div =
class=3D""><br class=3D""></div><div class=3D"">With about 30MB of state =
we see the checkpoints completing in 151ms.&nbsp; Using a =
RocksDBStateBackend configured as:</div><div class=3D""><br =
class=3D""></div><div class=3D"">val backend =3D new =
RocksDBStateBackend("<a href=3D"file:///home/jason/flink/checkpoint" =
class=3D"">file:///home/jason/flink/checkpoint</a>")<br =
class=3D""></div><div class=3D"">backend.setDbStoragePath("<a =
href=3D"file:///home/jason/flink/rocksdb" =
class=3D"">file:///home/jason/flink/rocksdb</a>")</div><div =
class=3D"">backend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMI=
ZED)</div><div class=3D"">env.setStateBackend(backend)</div><div =
class=3D""><br class=3D""></div><div class=3D"">Running the same test =
the checkpoint takes 3 minutes 42 seconds.</div><div class=3D""><br =
class=3D""></div><div class=3D"">I expect it to be slower, but that =
seems excessive.&nbsp; I am also a little confused as to when rocksDB =
and flink decide to write to disk, because watching the database the =
.sst file wasn't created until significantly after the checkpoint was =
completed, and the state had not changed.&nbsp; Is there anything I can =
do to increase the speed of the checkpoints, or anywhere I can look to =
debug the issue? &nbsp;(Nothing seems out of the ordinary in the flink =
logs or rocksDB logs)</div><div class=3D""><br class=3D""></div><div =
class=3D"">Thanks!</div><div class=3D""><br class=3D""></div>-- <br =
class=3D""><div class=3D"gmail_signature"><div dir=3D"ltr" =
class=3D""><strong style=3D"color:rgb(32,53,68);font-family:&quot;open =
sans&quot;,helvetica,sans-serif;font-size:12px" class=3D"">Jason =
Brelloch</strong><span style=3D"color:rgb(32,53,68);font-family:&quot;open=
 sans&quot;,helvetica,sans-serif;font-size:12px" class=3D"">&nbsp;| =
Product Developer</span><span =
style=3D"color:rgb(32,53,68);font-family:&quot;open =
sans&quot;,helvetica,sans-serif;font-size:12px" class=3D""></span><div =
style=3D"color:rgb(32,53,68);font-family:&quot;open =
sans&quot;,helvetica,sans-serif;font-size:12px" class=3D""><div =
style=3D"padding:5px 0px" class=3D"">3405 Piedmont Rd. NE, Suite 325, =
Atlanta, GA 30305&nbsp;</div><a href=3D"http://www.bettercloud.com/" =
target=3D"_blank" class=3D""><img alt=3D"" =
src=3D"https://www.bettercloud.com/wp-content/uploads/email-sig.png" =
style=3D"width: 144px; margin: 10px 0px;" class=3D""></a></div><div =
style=3D"color:rgb(32,53,68);font-family:&quot;open =
sans&quot;,helvetica,sans-serif;font-size:12px" class=3D""><a =
href=3D"https://www.bettercloud.com/monitor?utm_source=3Dbettercloud_email=
&amp;utm_medium=3Demail_signature&amp;utm_campaign=3Dmonitor_launch" =
style=3D"color:rgb(0,171,228);text-decoration:none" target=3D"_blank" =
class=3D"">Subscribe to the BetterCloud Monitor</a>&nbsp;- Get IT =
delivered to your inbox</div></div></div>
</div></div>
</div></blockquote></div><br =
class=3D""></div></div></div></blockquote></div><br =
class=3D""></body></html>=

--Apple-Mail=_0495BEC6-CEEA-4000-843D-2EA496F7FA4C--