streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hager [W2O Digital]" <mha...@w2odigital.com>
Subject Re: Proposing Changes to the StreamsProvider interface and StreamsProviderTask
Date Thu, 15 May 2014 15:37:21 GMT
Ryan,

Thank you for your comments!! I, however, must respectfully disagree. The current pattern
is very limiting. A provider should know if it is not functioning correctly, I agree.

 However, I would like to challenge how a user would re-use various providers offered as contrib
to streams and alert without changing the existing provider. Take the twitter stream for example.
In an ideal world a user would be able to use that out of the box, without any modifications.
If he is getting stale data and wants to say, use "pager duty" to alert him that no data is
being provided, he would have to subclass the provider, and read through the code to be able
to do so. Which defeats the purpose of any of the contrib libraries.

I see many benefits to the proposed pattern, and, with backwards comparability, I don't see
a downside. 

I have written many systems like this before in the home security monitoring industry and
this pattern, I have found, is the most flexible to handling the use-cases this project aims
to satisfy.

Thoughts?

Sent from my iPhone

> On May 15, 2014, at 9:45 AM, "Ryan Ebanks" <ryanebanks@gmail.com> wrote:
> 
> I think a processor should solely be responsible for processing data.  I
> think the interface describes exactly what a processor should do, get a
> piece of data and produce out put data.  Having it do more than that is
> expanding the functionality of a processor beyond its intent.
> 
> I do agree that being able to fire off events for lack of data is
> necessary, but that should be the responsibility of the provider.  The
> provider is the connection to the outside world and  should be in charge of
> producing new data and firing off events when new data is not produced or
> missing.  The provider should know exactly when the last piece of new data
> entered the system, and so it will know when to send alerts.  Everything
> downstream of the provider should be a closed flow and have guarantees of
> processing.  Which most runtimes do, the local runtime needs a lot of love
> though.
> 
> Without a doubt the runtime environments need to be improved and rigorously
> tested to make sure that once data enters a stream from a provider, that no
> data is dropped and completes the stream.  However, I do not believe that
> changing the processor is the answer that problem.
> 
> Ryan Ebanks
> 
> 
> On Wed, May 14, 2014 at 3:12 PM, Matthew Hager [W2O Digital] <
> mhager@w2odigital.com> wrote:
> 
>> Matt,
>> 
>> 
>> As always thanks for your feedback and mentorship as I work to contribute
>> to this project.
>> 
>> I feel the current pattern for processor is extremely limiting given the
>> constraint of the return statement rather than the alternative of receive
>> -> process -> write. It seems that if we revise the current interfaces we
>> could handle many more use-cases with streams that cannot currently be
>> accomplished.
>> 
>> Current "StreamsProcessor":
>> 
>> -- Contract --
>> List<StreamsDatum> process(StreamsDatum datum);
>> 
>> -- Drawbacks --
>> For the processor to access the 'writer' he must have an item to be
>> processed. If he doesn't have an item, then he can't write anything
>> downstream.
>> 
>> -- Use Cases Not Satisfied --
>> In many cases the absence of data on a 'stream' is an event in itself. For
>> instance, if I want to have a listener at various portions of the process
>> to ensure my processing data providers are working appropriately. If I
>> want to send out a message, log an event to Oracle, and alert PagerDuty,
>> for stale data. I can write a quick processor that checks for stale data,
>> then upon recognition of stale data (timer expiring), fire the proper
>> events to it's writers.
>> 
>> Other use cases involve... debugging, stream summarization metrics,
>> trending algorithms, processing  stats, and many others.
> 
> 
> 
> 
> 
> 
>> 
>> +++ Proposed Interfaces +++
>> 
>> // Analogous to Provider
>> public interface Provider {
>>        push(StreamsDatum);
>>        registerListener(Listener);
>> }
>> 
>> 
>> // Analogous to Writer
>> public interface Listener {
>>        receive(StreamsDatum);
>> }
>> 
>> 
>> // Analogous to Processor
>> public interface Processor implements Prover, Listener {
>> 
>> }
>> 
>> +++ Conclusion +++
>> I feel the event driven model is more flexible than the current model. We
>> can write an interface for the legacy 'contract' to ease implementation /
>> transition.
>> 
>> IE:
>> 
>> interface LegacyProcessor {
>>        List<StreamsDatum> process(StreamsDatum datum);
>> 
>> }
>> 
>> class LegacyProcessorImpl {
>> 
>>        listen(StreamsDatum datum) {?
>>                forEach(StreamsDatum d : this.transform(datum) {
>>                        ?this.write(datum);?
>>                }
>>        ?}
>> 
>> }
>> 
>> 
>> I really love streams, I think the concept is great, I think adding this
>> would take streams from "great" --> "OMG, the whole world should use this
>> all the time for everything ever, it is flipping amazing and better than
>> ice cream on a hot sunny day."
>> 
>> 
>> Thoughts?
>> 
>> 
>> 
>> 
>> Thanks!
>> Smashew
>> 
>> 
>> 
>>> On 5/13/14, 2:38 PM, "Matt Franklin" <m.ben.franklin@gmail.com> wrote:
>>> 
>>> On Tue, May 6, 2014 at 10:53 PM, Matthew Hager [W2O Digital] <
>>> mhager@w2odigital.com> wrote:
>>> 
>>> <snip />
>>> 
>>> 
>>>> StreamsResultSet - I actually found this to be quite useful paradigm. A
>>>> queue prevents a buffer overflow, an iterator makes it fun and easy to
>>>> read (I love iterators), and it is simple and succinct. I do, however,
>>>> feel it is best expressed as an interface instead of a class.
>>> 
>>> 
>>> I agree with an interface.  IMO, anything that is not a utility style
>>> helper should be interacted with via its interface.
>>> 
>>> 
>>>> The thing missing from this, as an interface, would be the notion of
>>>> "isRunning" which could easily
>>>> satisfy both of the aforementioned modalities.
>>> 
>>> Reasonable.
>>> 
>>> 
>>>> Event Driven - I concur with Matt 100% on this. As currently
>>>> implemented,
>>>> LocalStreamsBuilder is exceedingly inefficient from a memory perspective
>>>> and time execution perspective. To me, it seems, that we could almost
>>>> abstract out 2 common interfaces to make this happen.
>>>> 
>>>>        * Listener { receive(StreamsDatum); }
>>>>        * Producer { push(StreamsDatum); registerListener(Listener); }
>>>> 
>>>> Where the following implementations would place:
>>>> 
>>>>        * Reader implements Producer
>>>>        * Processor implements Producer, Listener
>>>>        * Writer implements Listener
>>> 
>>> Seems logical.  I would like to see the two possible operating modes
>>> represented as distinct interfaces.
>>> 
>> 
>> 

Mime
View raw message