flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanne de Roever <sanne.de.roe...@gmail.com>
Subject Re: Flink Kafka producer with a topic per message
Date Wed, 07 Dec 2016 09:55:20 GMT
A first sketch

Central to this functionality is Kafka's ProducerRecord. ProducerRecord was
introduced for Kafka 0.8. This means that any functionality could be
introduced for all Flink-Kafka connectors; as per
https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/connectors/kafka.html
ProducerRecord does two things:

   - It allows a Kafka producer to send messages to different topics in
   Kafka; this can be very helpful for message routing (I can make a more
   formal example later)
   - It also allows to create a key that determines the partition of the
   message; introducing this would give Flink a more generic interface to
   Kafka, which is a good thing.
   - A partition can be identified by an integer or a key String that will
   be hashed

The next step would be to determine the impact on the interface of a Sink.
Currently a Kafka sink has one topic, for example:

.addSink(new FlinkKafkaProducer09[String](outputTopic, new
SimpleStringSchema(), producerProps))

In the new scenario one would like to pass not only the message to be sent,
but also a topic string and a partition id or key (tuple-ish?). The next
suggestion is just to start the thinking a bit; a shot in the dark. As
somewhat blunt approach would be to map all messages to a valid
ProducerRecord, and then to pass this ProducerRecord to the the Sink, and
the rest is history. No attempt at abstractions are made, the reasoning
being as follows.

Evaluating I see the following. The current KafkaSink abstracts the Kafka
functionality out on the Flink side. This is a good thing, and will work
for most cases. Providing a tighter integration with Kafka will probably
break down the abstraction. This seems to point into the direction of
creating an advanced Kafka Sink. This sink gives more control, but less
abstraction; it is for advanced applications. Any abstraction attempts will
only create less transparency as far as I can see. The contract would not
likely work on other queuing providers.



On Wed, Dec 7, 2016 at 10:27 AM, Sanne de Roever <sanne.de.roever@gmail.com>
wrote:

> Good questions, I will follow up piece-wise to address the different
> questions. Could a Wiki section be an idea, before I spread the information
> across several posts?
>
> On Tue, Dec 6, 2016 at 4:50 PM, Stephan Ewen <sewen@apache.org> wrote:
>
>> You are right, it does not exist, and it would be a nice addition.
>>
>> Can you sketch some details on how to do that?
>>
>>   - Will it be a new type of producer? If yes, can as much as possible of
>> the code be shared between the current and the new producer?
>>   - Will it only be part of the Flink Kafka 0.10 producer?
>>
>> Thanks,
>> Stephan
>>
>>
>>
>> On Tue, Dec 6, 2016 at 2:23 PM, Sanne de Roever <
>> sanne.de.roever@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Kafka producer clients for 0.10 allow the following syntax:
>>>
>>> producer.send(new ProducerRecord<String, String>("my-topic",
>>> Integer.toString(i), Integer.toString(i)));
>>>
>>> The gist is that one producer can send messages to different topics; it
>>> is useful for event routing ao. It makes the creation generic endpoints
>>> easier. If I am right, Flink currently does not support this; would this be
>>> a useful addition?
>>>
>>> Cheers,
>>>
>>> Sanne
>>>
>>
>>
>

Mime
View raw message