flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Biplob Biswas <revolutioni...@gmail.com>
Subject Re: state size effects latency
Date Mon, 30 Oct 2017 09:02:21 GMT
Hi Tovi,

This might seem a really naive question (and its neither a solution or
answer to your question ) but I am trying to understand how latency is
viewed. You said you achieved less than 5 ms latency and say for the 99th
percentile you achieved 0.3 and 9 ms respectively, what kind of latency is
this? specific operator latency? because the end to end latency is around
50ms and 370 ms.

Was just curious how latency is seen from a different perspective, would
really help me in my understanding.

Thanks a lot,
Biplob

Thanks & Regards
Biplob Biswas

On Mon, Oct 30, 2017 at 8:53 AM, Sofer, Tovi <tovi.sofer@citi.com> wrote:

> Thank you Joshi.
>
> We are using currently FsStateBackend since in version 1.3 it supports
> async snapshots, and no RocksDB.
>
>
>
> Does anyone else has feedback on this issues?
>
>
>
> *From:* Narendra Joshi [mailto:narendraj9@gmail.com]
> *Sent:* יום א 29 אוקטובר 2017 12:13
> *To:* Sofer, Tovi [ICG-IT] <ts72414@imceu.eu.ssmb.com>
> *Cc:* user <user@flink.apache.org>
> *Subject:* Re: state size effects latency
>
>
>
> We have also faced similar issues. The only thing that happens in sync
> when using async snaphots is getting a persistent point in time picture
> which in case of rocksdb backend is making symlinks. That would linearly
> increase with number of files to symlink but this should be negligible. We
> could not find a satisfying reason for increase in latency with state size.
>
> Best,
> Narendra
>
> Narendra Joshi
>
> On 29 Oct 2017 15:04, "Sofer, Tovi" <tovi.sofer@citi.com> wrote:
>
> Hi all,
>
>
>
> In our application we have a requirement to very low latency, preferably
> less than 5ms.
>
> We were able to achieve this so far, but when we start increasing the
> state size, we see distinctive decrease in latency.
>
> We have added MinPauseBetweenCheckpoints, and are using async snapshots.
>
> ·         Why does state size has such distinctive effect on latency? How
> can this effect be minimized?
>
> ·         Can the state snapshot be done using separates threads and
> resources in order to less effect on stream data handling?
>
>
>
>
>
> Details:
>
>
>
> Application configuration:
>
> env.enableCheckpointing(1000);
>
> env.getCheckpointConfig().*setMinPauseBetweenCheckpoints*(1000);
>
> env.setStateBackend(new FsStateBackend(checkpointDirURI, true)); // use
> async snapshots
>
> env.setParallelism (16) ; //running on machine with 40 cores
>
>
>
> Results:
>
>
>
> A.      *When state size is ~20MB got latency of 0.3 ms latency for 99’th
> percentile*
>
>
>
> *Latency info: *(in nanos)
>
> 2017-10-26 07:26:55,030 INFO  com.citi.artemis.flink.reporters.Log4JReporter
> - [Flink-MetricRegistry-1] localhost.taskmanager.
> 6afd21aeb9b9bef41a4912b023469497.Flink Streaming
> Job.AverageE2ELatencyChecker.0.LatencyHistogram: count:10000 min:31919
> max:13481166 mean:89492.0644 stddev:265876.0259763816 p50:68140.5
> p75:82152.5 p95:146654.0499999999 p98:204671.74 p99:308958.73999999993
> p999:3844154.002999794
>
> *State\checkpoint info:*
>
>
>
> [image: cid:image001.png@01D350DC.40449520]
>
>
>
>
>
>
>
> *B.**      When state size is ~200MB latency was significantly decreased
> to 9 ms latency for 99’th percentile*
>
> *Latency info: *
>
> 2017-10-26 07:17:35,289 INFO  com.citi.artemis.flink.reporters.Log4JReporter
> - [Flink-MetricRegistry-1] localhost.taskmanager.
> 05431e7ecab1888b2792265cdc0ddf84.Flink Streaming
> Job.AverageE2ELatencyChecker.0.LatencyHistogram: count:10000 min:30186
> max:46236470 mean:322105.7072 stddev:2060373.4782505725 p50:68979.5
> p75:85780.25 p95:219882.69999999914 p98:2360171.4399999934
> p99:9251766.559999945 p999:3.956163987499886E7
>
> *State\checkpoint info:*
>
>
>
>
>
> [image: cid:image002.png@01D350DC.40449520]
>
>
>
> Thanks and regrdas,
>
> Tovi
>
>
>
>

Mime
View raw message