flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Nowojski <pi...@data-artisans.com>
Subject Re: Very low-latency - is it possible?
Date Thu, 31 Aug 2017 13:19:49 GMT
Achieving 1ms in any distributed system might be problematic, because even simplest ping messages
between worker nodes take ~0.2ms.

However, as you stated your desired throughput (40k records/s) and state is small, so maybe
there is no need for using a distributed system for that? You could try run single node Flink
instance (or 2 node instance with parallelism set to 1, just for automatic failures recovery).

As Jörn wrote earlier it might be just simpler to write simple custom java standalone application
for that. As long as your state fits into memory of a single node, you should be easily able
to process millions of records per second on a single machine. 


> On Aug 31, 2017, at 3:01 PM, Jörn Franke <jornfranke@gmail.com> wrote:
> If you really need to get that low something else might be more suitable. Given the times
a custom solution might be necessary. Flink is a generic powerful framework - hence it does
not address these latencies. 
>> On 31. Aug 2017, at 14:50, Marchant, Hayden <hayden.marchant@citi.com> wrote:
>> We're about to get started on a 9-person-month PoC using Flink Streaming. Before
we get started, I am interested to know how low-latency I can expect for my end-to-end flow
for a single event (from source to sink). 
>> Here is a very high-level description of our Flink design: 
>> We need at least once semantics, and our main flow of application is parsing a message
( < 50 microseconds) from Kafka, and then doing a keyBy on the parsed event ( <1kb)
and then updating a very small user state in the KeyedStream, and then doing another keyBy
and then operator of that KeyedStream. Each of the operators is a very simple operation -
very little calculation and no I/O.
>> ** Our requirement is to get close to 1ms (99%) or lower for end-to-end processing
(timer starts once we get message from Kafka). Is this at all realistic if are flow contains
2 aggregations?  If so, what optimizations might we need to get there regarding cluster configuration
(both Flink and Hardware). Our throughput is possibly small enough (40,000 events per second)
that we could run on one node - which might eliminate some network latency. 
>> I did read in https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html
in Exactly Once vs At Least Once that a few milliseconds is considered super low-latency -
wondering if we can get lower.
>> Any advice or 'war stories' are very welcome.
>> Thanks,
>> Hayden Marchant

View raw message