flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Why use Kafka after all?
Date Tue, 15 Nov 2016 10:43:08 GMT
Hi Matt,

as you've stated Flink is a stream processor and as such it needs to get
its inputs from somewhere. Flink can provide you up to exactly-once
processing guarantees. But in order to do this, it requires a re-playable
source because in case of a failure you might have to reprocess parts of
the input you had already processed prior to the failure. Kafka is such a
source and people use it because it happens to be one of the most popular
and widespread open source message queues/distributed logs.

If you don't require strong processing guarantees, then you can simply use
the WebSocket source. But, for any serious use case, you probably want to
have these guarantees because otherwise you just might calculate bogus
results. So in your case I would directly ingest my messages into Kafka and
then let Flink read from the created topic to do the processing.


On Tue, Nov 15, 2016 at 8:14 AM, Dromit <dromitlabs@gmail.com> wrote:

> Hello,
> As far as I've seen, there are a lot of projects using Flink and Kafka
> together, but I'm not seeing the point of that. Let me know what you think
> about this.
> 1. If I'm not wrong, Kafka provides basically two things: storage (records
> retention) and fault tolerance in case of failure, while Flink mostly cares
> about the transformation of such records. That means I can write a pipeline
> with Flink alone, and even distribute it on a cluster, but in case of
> failure some records may be lost, or I won't be able to reprocess the data
> if I change the code, since the records are not kept in Flink by default
> (only when sinked properly). Is that right?
> 2. In my use case the records come from a WebSocket and I create a custom
> class based on messages on that socket. Should I put those records inside a
> Kafka topic right away using a Flink custom source (SourceFunction) with a
> Kafka sink (FlinkKafkaProducer), and independently create a Kafka source
> (KafkaConsumer) for that topic and pipe the Flink transformations there? Is
> that data flow fine?
> Basically what I'm trying to understand with both question is how and why
> people are using Flink and Kafka.
> Regards,
> Matt

View raw message