ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Denis Magda (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-2016) Update KafkaStreamer to fit new features introduced in Kafka 0.9
Date Wed, 23 Dec 2015 10:26:46 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069479#comment-15069479

Denis Magda commented on IGNITE-2016:


Ok, finally I've figured out what is the difference between streamers, that we already have,
and connectors that were supported by Kafka in 0.9 release.
Agree, that a Kafka sink is absolutely different concept and it mustn't be mixed with streamers.

So far I have the following high-level (design related) review comments. Please address them
first and after that I'll start reviewing the code in detail.

1) In any case let's put Kafka sink implementation in existed {{ignite-kafka}} module. There
is no need to introduce additional module cause all Kafka related stuff will be located in
one single place.
Module structure should look like this:
- {{org.apache.ignite.stream.kafka}} package will contain {{KafkaStreamer}}. Later we can
add {{KafkaStreamerV2}} to this package that will be implemented using the new consumer API;
- {{org.apache.ignite.stream.kafka.connect}} package will contain your current Kafka Connect
based implementation.

2) Update {{kafka.version}} referred from {{ignite-kafka/pom.xml}} to the latest 0.9 version
and check that the all streamer works perfectly well (it should according to Kafka docs).

3) {{IgniteSinkTask.flush()}} method delivers data to the grid using {{cache.putAll(...)}}.
Instead of this approach I would switch to {{IgniteDataStreamer}} and use it data streaming
to Ignite. The reason is that {{IgniteDataStreamer}} will upload data to the grid much faster
than {{cache.putAll(...)}}.

4) {{IgniteSinkTask.put(...)}} buffers data in some internal data structure. Is there any
Kafka API requirement saying that the data mustn't been flushed until {{flush}} method is
called explicitly? Generally speaking I would reuse {{IgniteDataStreamer}} here as well by
setting {{IgniteDataStreamer.autoFlushFrequency(...)}} that will be equal to sink flush frequence
and just forward all the data to the streamer as soon as it's delivered via  {{IgniteSinkTask.put(...)}}.
The streamer will buffer the data and flush it to the grid with specified frequency or when
the internal buffer reaches some limit.

> Update KafkaStreamer to fit new features introduced in Kafka 0.9
> ----------------------------------------------------------------
>                 Key: IGNITE-2016
>                 URL: https://issues.apache.org/jira/browse/IGNITE-2016
>             Project: Ignite
>          Issue Type: New Feature
>          Components: streaming
>            Reporter: Roman Shtykh
>            Assignee: Roman Shtykh
> Particularly,
> - new consumer
> - Kafka Connect (Copycat)
> http://www.confluent.io/blog/apache-kafka-0.9-is-released
> This can be a a different integration task or a complete re-write of the current implementation,
considering the fact that Kafka Connect is a new standard way for "large-scale, real-time
data import and export for Kafka."

This message was sent by Atlassian JIRA

View raw message