flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks
Date Tue, 12 May 2015 05:18:31 GMT
 My proposal for the runtime classes (per my Pull Request is this):

StreamTask: base of streaming tasks, the task is the AbstractInvokable
that runs in the TaskManager and invokes stream operators
OneInputStreamTask and TwoOnputStreamTask and SourceStreamTask are the
subclasses responsible for actual types of operations.

StreamOperator: interface for StreamOperators such as Map, Reduce and so on
OneInputOperator and TwoInputStreamOperator are the interface for
operators with one input and two inputs respectively.

There are also AbstractStreamOperator, which provides basic
implementations for methods such as setup()/open()/close() and
AbstractUdfStreamOperator, which is derived from
AbstractStreamOperator. This is for operators that have user-code, it
deals with calling the correct functions of RichUserFunctionS
(open()/close()/setRuntimeContext()).

I realised that we should probably not rename all the actual operators
and remove the Stream prefix and suffix, that would be to big a change
and orthogonal to my current PR. Other people can do it if they want.

These are just my suggestions. Please suggest other consistent naming
schemes if think mine to be bad.

On Mon, May 11, 2015 at 9:40 PM, Stephan Ewen <sewen@apache.org> wrote:
> How about separating the discussions about runtime class renaming (there
> seems to be consensus) from the
> API class renaming (no consensus yet).
>
> To go ahead with the runtime classes, can you make a concrete suggestion
> for more memorable/describing names?
>
> For the  API classes, kick off a thread, if you want, but please clearly
> mark in your discussion that this is about an API breaking change
> to a user-facing API (that is still declared beta).
>
>
> On Mon, May 11, 2015 at 10:18 AM, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
>
>> Come to think of it, why do we even need SingleOutputStreamOperator?
>> It is just a subclass of DataStream that has almost no functionality
>> that couldn't be implemented in DataStream. I think it makes people
>> wonder why the result of a transformation is not a DataStream but this
>> mouthful of a class.
>>
>> And, I light of other possibilities such as MapDriver and PactDriver I
>> am quite happy with calling the things StreamOperator and StreamMap.
>> :D
>>
>> On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <balassi.marton@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I am in favor of removing the Stream (or Streaming) suffixes and
>> prefixes.
>> > I think that Gyula was also referring to those.
>> >
>> > I think the naming of the Tasks, and user facing operators
>> > (SingleOutputStreamOperator and alike) are fine.
>> >
>> > As for the other bunch of Operators we could name them Drivers to be
>> mostly
>> > in line with the batch naming. By the way, most of the classes do not
>> have
>> > "Operator" in their name currently - e.g. the one encapsulating the map
>> > functionality is called StreamMap, however the base classes
>> (StreamOperator
>> > and ChainableStreamOperator) have it in their name explicitly. I could go
>> > with MapDriver instead of StreamMap, ChainableStreamOperator will be
>> > eliminated anyway - StreamOperator needs a new name then: worst case
>> > scenario PactDriver. :)
>> >
>> > As for n-ary operators I agree with Gyula.
>> >
>> > On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <aljoscha@apache.org>
>> > wrote:
>> >
>> >> Which name changes are you referring to? The proposed names in my
>> >> recent PR? Or the dropping of Stream from all the classes. For the
>> >> rest I was just rambling about how I don't like the names in the batch
>> >> API. :D
>> >>
>> >> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gyula.fora@gmail.com>
>> wrote:
>> >> > Generally I am in favor of making these name changes. My only concern
>> is
>> >> > regarding to the one-input and multiple inputs operators.
>> >> >
>> >> > There is a general problem with the n-ary operators regarding type
>> >> safety,
>> >> > thats why we now have SingleInput and Co (two-input) operators. I
>> think
>> >> we
>> >> > should keep these.
>> >> >
>> >> > On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <
>> aljoscha@apache.org>
>> >> > wrote:
>> >> >
>> >> >> Hi,
>> >> >> since I'm currently reworking the Stream operators I thought it's
a
>> >> >> good time to talk about the naming of some classes. We have some
>> >> >> legacy problems with lots of Operators, OperatorBases, TwoInput,
>> >> >> OneInput, Unary, Binary, etc. And maybe we can break things in
>> >> >> streaming to have more consistent and future-proof naming.
>> >> >>
>> >> >> In streaming, there are:
>> >> >> - Tasks, these are an AbstractInvokabe and contain the main loop
of a
>> >> >> streaming vertex. They read from the inputs and forward data to
the
>> >> >> operator implementation.
>> >> >>
>> >> >> - Operators, these are invoked by a Task and are responsible for
the
>> >> >> actual logic of the operator. Think Map, Join, Reduce and so on.
>> These
>> >> >> are responsible for calling the user-defined function.
>> >> >>
>> >> >> - Operators (again, I know), these are user facing classes (some
>> >> >> derived from DataStream, some not). There is for example
>> >> >> SingleOutputStreamOperator, for the result of a DataStream
>> >> >> transformation that has a single output. There are also
>> >> >> TemporalOperator and its derived classes StreamCrossOperator and
>> >> >> StreamJoinOperator. The actual operator inside a task (the ones
I
>> >> >> mentioned before that are responsible for the user logic) that
>> >> >> executes a temporal join is called CoStreamWindow (with a
>> >> >> JoinWindowFunction).
>> >> >>
>> >> >> As I currently have it in my PR, there are two Task classes, one
for
>> >> >> single input, and one for two-input operators. There are also the
>> >> >> corresponding operator interfaces for unary and binary operators
(see
>> >> >> what I did there ... :D).
>> >> >>
>> >> >> What should we call all these classes (concepts). Also I'm heavily
in
>> >> >> favour of dropping all the Stream (or Streaming) prefixes and
>> suffixes
>> >> >> from the class names. I know I'm in streaming because the package
is
>> >> >> named streaming. And we should not restrain ourselves because the
>> >> >> batch API also has things called operator.
>> >> >>
>> >> >> Also, the concept of one-input, two-input tasks and operators is
not
>> >> >> very scalable, Maybe we should have a single interface for operators
>> >> >> that has a receiveElement(int, element) method that tells the
>> operator
>> >> >> from which input an element came. Then we can scale this to n-ary
>> >> >> operators. This would of course have the overhead of always sending
>> >> >> along the number of the input instead of encoding the input number
in
>> >> >> the method name, such as receiveElement1() and receiveElement2().
>> >> >>
>> >> >> Any thoughts? :D (I know I'm writing the long annoying emails today
>> >> >> but I think it is important we discuss these things before being
>> stuck
>> >> >> with them.)
>> >> >>
>> >> >> Cheers,
>> >> >> Aljoscha
>> >> >>
>> >>
>>

Mime
View raw message