flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dromit <dromitl...@gmail.com>
Subject Re: Why use Kafka after all?
Date Wed, 16 Nov 2016 05:07:01 GMT
"So in your case I would directly ingest my messages into Kafka"

I will do that through a custom SourceFunction that reads the messages from
the WebSocket, creates simple java objects (POJOs) and sink them in a Kafka
topic using a FlinkKafkaProducer, if that makes sense.

The problem now is I need a DeserializationSchema for my class. I read
Flink is able to de/serialize POJO objects by its own, but I'm still
required to provide a serializer to create the FlinkKafkaProducer (and

Any idea or example? Should I create a DeserializationSchema for each POJO
class I want to put into a Kafka stream?

On Tue, Nov 15, 2016 at 7:43 AM, Till Rohrmann <trohrmann@apache.org> wrote:

> Hi Matt,
> as you've stated Flink is a stream processor and as such it needs to get
> its inputs from somewhere. Flink can provide you up to exactly-once
> processing guarantees. But in order to do this, it requires a re-playable
> source because in case of a failure you might have to reprocess parts of
> the input you had already processed prior to the failure. Kafka is such a
> source and people use it because it happens to be one of the most popular
> and widespread open source message queues/distributed logs.
> If you don't require strong processing guarantees, then you can simply use
> the WebSocket source. But, for any serious use case, you probably want to
> have these guarantees because otherwise you just might calculate bogus
> results. So in your case I would directly ingest my messages into Kafka and
> then let Flink read from the created topic to do the processing.
> Cheers,
> Till
> On Tue, Nov 15, 2016 at 8:14 AM, Dromit <dromitlabs@gmail.com> wrote:
>> Hello,
>> As far as I've seen, there are a lot of projects using Flink and Kafka
>> together, but I'm not seeing the point of that. Let me know what you think
>> about this.
>> 1. If I'm not wrong, Kafka provides basically two things: storage
>> (records retention) and fault tolerance in case of failure, while Flink
>> mostly cares about the transformation of such records. That means I can
>> write a pipeline with Flink alone, and even distribute it on a cluster, but
>> in case of failure some records may be lost, or I won't be able to
>> reprocess the data if I change the code, since the records are not kept in
>> Flink by default (only when sinked properly). Is that right?
>> 2. In my use case the records come from a WebSocket and I create a custom
>> class based on messages on that socket. Should I put those records inside a
>> Kafka topic right away using a Flink custom source (SourceFunction) with a
>> Kafka sink (FlinkKafkaProducer), and independently create a Kafka source
>> (KafkaConsumer) for that topic and pipe the Flink transformations there? Is
>> that data flow fine?
>> Basically what I'm trying to understand with both question is how and why
>> people are using Flink and Kafka.
>> Regards,
>> Matt

View raw message