streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Frazee <joey.fra...@icloud.com>
Subject Re: [DISCUSS] Beam
Date Mon, 21 Nov 2016 23:27:15 GMT
I'm in favor of this for a few reasons:

- There are enough stream processing frameworks out there that it makes it hard for us to
offer much on that front. I don't think streams fills a gap for this internally so we have
more to contribute in creating something that people can use with Beam.

- It should help make the story clearer to outsiders on "how to run streams".

- While it may be immature, as Trevor and Suneel mention, I think they can probably do as
good a job keeping the interfaces stable as we can in maintaining runtimes and interfaces
internally. We'll give up some control but they'll do a good job too.

Now there will for sure be some drawbacks. We'll be beholden to someone else and probably
have to scramble to stay up to date sometimes. And it's naive to think it's ever going to
provide for every feature of the underlying runner, so we might find ourselves in situations
where something that should be easy is hard.

-joey

> On Nov 21, 2016, at 3:43 PM, sblackmon <sblackmon@apache.org> wrote:
> 
>  
> 
>> On November 21, 2016 at 2:19:11 PM, Suneel Marthi (suneel.marthi@gmail.com(mailto:suneel.marthi@gmail.com))
wrote: 
>> 
>> I agree too, I have been playing with Beam for a few months now without a  
>> runner and the API is still immature, but nevertheless keep it on the radar  
>> since its gonna be a TLP soon.  
>> 
>> 
>> From Streams perspective, how do we see the project using Beam (similar to  
>> Spark/flink now); if so we can preliminary version of Beam support with  
>> Local Dataflow runner.  
>> 
> 
> Hypothesis expanded:  
> 
> We could implement all the components in the project (providers, persister, and processors)
directly against  
> Beam APIs (Source, Sink, DoFn, etc…) and support two primary execution models for project
capabilities:
> 
> 1) direct instantiation of a single instance of a component, call beam equivalents of
setup, process, teardown yourself. This is common throughout project unit and integration
tests already.  
> 2) compose a beam Pipeline combining Streams and non-Streams components, run with your
preferred beam runner(s).
> 
> In this scenario I think streams-runtimes would either go away entirely or only contain
helper methods (no classes with a static main)  
> 
>> 
>> 
>> On Mon, Nov 21, 2016 at 3:14 PM, Trevor Grant  
>> wrote:
>> 
>>> IMHO, Beam is too immature and the API is to unstable at this time to
>>> integrate, however I am in favor of watching the Beam project develop and
>>> starting to think through what an integration might look like.
>>> 
>>> Just my .02, based on some fairly lack-luster experiences with Apache Beam.
>>> 
>>> tg
>>> 
>>> 
>>> 
>>> 
>>> Trevor Grant
>>> Data Scientist
>>> https://github.com/rawkintrevo
>>> http://stackexchange.com/users/3002022/rawkintrevo
>>> http://trevorgrant.org
>>> 
>>> *"Fortunate is he, who is able to know the causes of things." -Virgil*
>>> 
>>> 
>>>> On Mon, Nov 21, 2016 at 11:36 AM, sblackmon wrote:
>>>> 
>>>> Beam appears to be on it’s way to being the de-facto standard for data
>>>> pipelines.
>>>> 
>>>> I’d like to start a real discussion about whether and how to align
>>> streams
>>>> interfaces with Beam interfaces.
>>>> 
>>>> To pose a straw-man theory for discussion:
>>>> 
>>>> Hypothesis: Streams would benefit by replacing the interfaces in
>>>> streams-core entirely with beam interfaces.
>>>> 
>>>> a) Do we agree that the flexibility and performance gains from doing so,
>>>> presuming it’s possible, would be significant?
>>>> b) Are there any inevitable flexiblility, performance, complexity, or
>>>> other, blockers or compromises we should discuss?
>>>> c) What arguments are there for retaining our interfaces and providing
>>>> beam compatibility in a runtime module binding (within streams) vs
>>>> deprecating our existing interfaces and switching over completely?
>>>> d) Obviously doing this would be a lot of work. What level of commitment
>>>> is there from the group to work on this?
>>>> 
>>>> Steve
>>>> On October 25, 2016 at 3:47:11 PM, sblackmon (sblackmon@apache.org)
>>> wrote:
>>>> 
>>>> Regarding Beam, there have been a number of ideas and theories floated on
>>>> the list and but nothing concrete has been proposed or discussed in
>>> depth.
>>>> 
>>>> Steve
>>>> On October 25, 2016 at 10:21:52 AM, Suneel Marthi (
>>> suneel.marthi@gmail.com)
>>>> wrote:
>>>> 
>>>> Is support for Kafka Streams and Apache Beam on the roadmap ?
>>>> 
>>> 
> 

Mime
View raw message