streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Franklin <m.ben.frank...@gmail.com>
Subject Re: [DISCUSS] What to do with streams-runtime-local and other streams-runtimes modules
Date Tue, 11 Oct 2016 20:31:18 GMT
On Tue, Sep 27, 2016 at 6:05 PM sblackmon <sblackmon@apache.org> wrote:

> All,
>
>
>
> Joey brought this up over the weekend and I think a discussion is overdue
> on the topic.
>
>
>
> Streams components were meant to be compatible with other runtime
> frameworks all along, and for the most part are implemented in a manner
> compatible with distributed execution where coordination, message passing,
> and lifecycle and handled outside of streams libraries.  By community
> standards any component or component configuration object that doesn't
> cleanly serializable for relocation in a distributed framework is a bug.
>

Agreed, though this could be more explicit.


>
>
>
> When the streams project got started in 2012 storm was the only TLP
> real-time data processing framework at apache, but now there are plenty of
> good choices all of which are faster and better tested than our
> streams-runtime-local module.
>
>

>
> So, what should be the role of streams-runtime-local?  Should we keep it
> at all?  The tests take forever to run and my organization has stopped
> using it entirely.  The best argument for keeping it is that it is useful
> when integration testing small pipelines, but perhaps we could just agree
> to use something else for that purpose?
>
>
I think having a local runtime for testing or small streams is valuable,
but there is a ton of work that needs to go into the current runtime.


>
>
> Do we want to keep the other runtime modules around and continue adding
> more?  I’ve found that when embedding streams components in other
> frameworks (spark and flink most recently) I end up creating a handful of
> classes to help bind streams interfaces and instances within the pdfs /
> functions / transforms / whatever are that framework atomic unit of
> computation and reusing them in all my pipelines.
>
>
I think this is valuable.  A set of libraries that adapt a common
programming model to various frameworks that simply stream development is
inherently cool. Write once, run anywhere.


>
>
> How about the StreamBuilder interface?  Does anyone still believe we
> should support (and still want to work on) classes
> implementing StreamBuilder to build and running a pipeline comprised solely
> of streams components on other frameworks?  Personally I prefer to write
> code using the framework APIs at the pipeline level, and embed individual
> streams components at the step level.
>
>
I think this could be valuable if done better.  For instance, binding
classes to steps in the stream pipeline, rather than instances.  This would
let the aforementioned adapter libraries configure components using the
programming model declared by streams and setup pipelines in target
systems.


>
>
> Any other thoughts on the topic?
>
>
>
> Steve
>
>
>
> - What should the focus be? If you look at the code, the project really
> provides 3 things: (1) a stream processing engine and integration with data
> persistence mechanisms, (2) a reference implementation of ActivityStreams,
> AS schemas, and tools for interlinking activity objects and events, and (3)
> a uniform API for integrating with social network APIs. I don't think that
> first thing is needed anymore. Just looking at Apache projects, NiFi, Apex
> + Apex Malhar, and to some extent Flume are further along here. Stream Sets
> covers some of this too, and arguably Logstash also gets used for this sort
> of work. I.e., I think the project would be much stronger if it focused on
> (2) and (3) and marrying those up to other Apache projects that fit (1).
> Minimally, it needs to be de-entangled a bit.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message