ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roman Shtykh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-529) Implement IgniteFlumeStreamer to stream data from Apache Flume
Date Tue, 10 Nov 2015 14:33:11 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998660#comment-14998660
] 

Roman Shtykh edited comment on IGNITE-529 at 11/10/15 2:32 PM:
---------------------------------------------------------------

Anton, please see my comments inline.

A> Could you please check this issue.
R> I might forget to commit something. I appologize for that and will check it tomorrow
morning.

A> Sink will be created inside Flume and use FlumeStreamer to put data to Ignite.
R> Right.

A> I this case Extractor should be specified inside FlumeStreamer (not as a constructor
parameter, streamer should decide what transformer it should use or custom implementation
can have custom transformer). I mean Flume & Ignite logic shoul be separated. All Flume
related logic inside Sink & and all Ignite related inside FlumeStreamer.
DataStreamer also should not be provided by Sink, because it Ignite-related logic.
R> I.e., the user should be responsible for implementing a Flume streamer, right? That
is why my first approach was to enable extending a Flume streamer base class and let a user
to implement all event conversion and other logic. Then it is specified in Flume's configurations
and instantiated in sink.
R> That is the best separation we can achieve. Ignite instance and cache will be created
in Flume streamer though, and that is what you mentioned not recommended.

R> In my current approach I take care of everything except implementing an extractor which
is a user's responsibility (close to what most Flume sinks have now, please see serializer
and converter parameters at https://flume.apache.org/FlumeUserGuide.html).
R> Both approaches are not perfect, but we have to choose one. Do you have another idea?

A> In case it impossible to provide configured instance of FlumeStreamer to Sink, FlumeStreamer
should build themself using some configuration, possible provided via constructor.
R> Sorry, I am not sure I understand how Sink can send events to Ignite if FlumeStreamer
is not specified...

A> Sink in my view is just a channel and it should communicate with FlumeStreamer only
without usage of other apache.ignite.* classes.

R> I've seen many implementations of Flume sinks. As I mentioned, sink takes responsibilities
for instantiation of components that save incoming data. On the other hand, Ignite streamer
(I judge from implementation we have now) also wants to instantiate components (clients) for
pulling data.
R> To achieve complete separation of sink and streamer, I think only of having sink and
streamer on separate JVMs communicating over some protocol (which is an overhead already).



was (Author: roman_s):
Anton, please see my comments inline.

Could you please check this issue.
R> I might forget to commit something. I appologize for that and will check it tomorrow
morning.

Sink will be created inside Flume and use FlumeStreamer to put data to Ignite.
R> Right.

I this case Extractor should be specified inside FlumeStreamer (not as a constructor parameter,
streamer should decide what transformer it should use or custom implementation can have custom
transformer). I mean Flume & Ignite logic shoul be separated. All Flume related logic
inside Sink & and all Ignite related inside FlumeStreamer.
DataStreamer also should not be provided by Sink, because it Ignite-related logic.
R> I.e., the user should be responsible for implementing a Flume streamer, right? That
is why my first approach was to enable extending a Flume streamer base class and let a user
to implement all event conversion and other logic. Then it is specified in Flume's configurations
and instantiated in sink.
R> That is the best separation we can achieve. Ignite instance and cache will be created
in Flume streamer though, and that is what you mentioned not recommended.

R> In my current approach I take care of everything except implementing an extractor which
is a user's responsibility (close to what most Flume sinks have now, please see serializer
and converter parameters at https://flume.apache.org/FlumeUserGuide.html).
R> Both approaches are not perfect, but we have to choose one. Do you have another idea?

In case it impossible to provide configured instance of FlumeStreamer to Sink, FlumeStreamer
should build themself using some configuration, possible provided via constructor.
R> Sorry, I am not sure I understand how Sink can send events to Ignite if FlumeStreamer
is not specified...

Sink in my view is just a channel and it should communicate with FlumeStreamer only without
usage of other apache.ignite.* classes.

R> I've seen many implementations of Flume sinks. As I mentioned, sink takes responsibilities
for instantiation of components that save incoming data. On the other hand, Ignite streamer
(I judge from implementation we have now) also wants to instantiate components (clients) for
pulling data.
R> To achieve complete separation of sink and streamer, I think only of having sink and
streamer on separate JVMs communicating over some protocol (which is an overhead already).


> Implement IgniteFlumeStreamer to stream data from Apache Flume
> --------------------------------------------------------------
>
>                 Key: IGNITE-529
>                 URL: https://issues.apache.org/jira/browse/IGNITE-529
>             Project: Ignite
>          Issue Type: Sub-task
>          Components: streaming
>            Reporter: Dmitriy Setrakyan
>            Assignee: Roman Shtykh
>
> We have {{IgniteDataStreamer}} which is used to load data into Ignite under high load.
It was previously named {{IgniteDataLoader}}, see ticket IGNITE-394.
> See [Apache Flume|http://flume.apache.org/] for more information.
> We should create {{IgniteFlumeStreamer}} which will consume messages from Apache Flume
and stream them into Ignite caches. 
> More details to follow, but to the least we should be able to:
> * Convert Flume data to Ignite data using an optional pluggable converter.
> * Specify the cache name for the Ignite cache to load data into.
> * Specify other flags available on {{IgniteDataStreamer}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message