flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
Subject Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks
Date Tue, 12 May 2015 09:04:21 GMT
I like it. Not sure if my vote counts ;)

On 05/12/2015 07:18 AM, Aljoscha Krettek wrote:
>  My proposal for the runtime classes (per my Pull Request is this):
> 
> StreamTask: base of streaming tasks, the task is the AbstractInvokable
> that runs in the TaskManager and invokes stream operators
> OneInputStreamTask and TwoOnputStreamTask and SourceStreamTask are the
> subclasses responsible for actual types of operations.
> 
> StreamOperator: interface for StreamOperators such as Map, Reduce and so on
> OneInputOperator and TwoInputStreamOperator are the interface for
> operators with one input and two inputs respectively.
> 
> There are also AbstractStreamOperator, which provides basic
> implementations for methods such as setup()/open()/close() and
> AbstractUdfStreamOperator, which is derived from
> AbstractStreamOperator. This is for operators that have user-code, it
> deals with calling the correct functions of RichUserFunctionS
> (open()/close()/setRuntimeContext()).
> 
> I realised that we should probably not rename all the actual operators
> and remove the Stream prefix and suffix, that would be to big a change
> and orthogonal to my current PR. Other people can do it if they want.
> 
> These are just my suggestions. Please suggest other consistent naming
> schemes if think mine to be bad.
> 
> On Mon, May 11, 2015 at 9:40 PM, Stephan Ewen <sewen@apache.org> wrote:
>> How about separating the discussions about runtime class renaming (there
>> seems to be consensus) from the
>> API class renaming (no consensus yet).
>>
>> To go ahead with the runtime classes, can you make a concrete suggestion
>> for more memorable/describing names?
>>
>> For the  API classes, kick off a thread, if you want, but please clearly
>> mark in your discussion that this is about an API breaking change
>> to a user-facing API (that is still declared beta).
>>
>>
>> On Mon, May 11, 2015 at 10:18 AM, Aljoscha Krettek <aljoscha@apache.org>
>> wrote:
>>
>>> Come to think of it, why do we even need SingleOutputStreamOperator?
>>> It is just a subclass of DataStream that has almost no functionality
>>> that couldn't be implemented in DataStream. I think it makes people
>>> wonder why the result of a transformation is not a DataStream but this
>>> mouthful of a class.
>>>
>>> And, I light of other possibilities such as MapDriver and PactDriver I
>>> am quite happy with calling the things StreamOperator and StreamMap.
>>> :D
>>>
>>> On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <balassi.marton@gmail.com>
>>> wrote:
>>>> Hi,
>>>>
>>>> I am in favor of removing the Stream (or Streaming) suffixes and
>>> prefixes.
>>>> I think that Gyula was also referring to those.
>>>>
>>>> I think the naming of the Tasks, and user facing operators
>>>> (SingleOutputStreamOperator and alike) are fine.
>>>>
>>>> As for the other bunch of Operators we could name them Drivers to be
>>> mostly
>>>> in line with the batch naming. By the way, most of the classes do not
>>> have
>>>> "Operator" in their name currently - e.g. the one encapsulating the map
>>>> functionality is called StreamMap, however the base classes
>>> (StreamOperator
>>>> and ChainableStreamOperator) have it in their name explicitly. I could go
>>>> with MapDriver instead of StreamMap, ChainableStreamOperator will be
>>>> eliminated anyway - StreamOperator needs a new name then: worst case
>>>> scenario PactDriver. :)
>>>>
>>>> As for n-ary operators I agree with Gyula.
>>>>
>>>> On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <aljoscha@apache.org>
>>>> wrote:
>>>>
>>>>> Which name changes are you referring to? The proposed names in my
>>>>> recent PR? Or the dropping of Stream from all the classes. For the
>>>>> rest I was just rambling about how I don't like the names in the batch
>>>>> API. :D
>>>>>
>>>>> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gyula.fora@gmail.com>
>>> wrote:
>>>>>> Generally I am in favor of making these name changes. My only concern
>>> is
>>>>>> regarding to the one-input and multiple inputs operators.
>>>>>>
>>>>>> There is a general problem with the n-ary operators regarding type
>>>>> safety,
>>>>>> thats why we now have SingleInput and Co (two-input) operators. I
>>> think
>>>>> we
>>>>>> should keep these.
>>>>>>
>>>>>> On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <
>>> aljoscha@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> since I'm currently reworking the Stream operators I thought
it's a
>>>>>>> good time to talk about the naming of some classes. We have some
>>>>>>> legacy problems with lots of Operators, OperatorBases, TwoInput,
>>>>>>> OneInput, Unary, Binary, etc. And maybe we can break things in
>>>>>>> streaming to have more consistent and future-proof naming.
>>>>>>>
>>>>>>> In streaming, there are:
>>>>>>> - Tasks, these are an AbstractInvokabe and contain the main loop
of a
>>>>>>> streaming vertex. They read from the inputs and forward data
to the
>>>>>>> operator implementation.
>>>>>>>
>>>>>>> - Operators, these are invoked by a Task and are responsible
for the
>>>>>>> actual logic of the operator. Think Map, Join, Reduce and so
on.
>>> These
>>>>>>> are responsible for calling the user-defined function.
>>>>>>>
>>>>>>> - Operators (again, I know), these are user facing classes (some
>>>>>>> derived from DataStream, some not). There is for example
>>>>>>> SingleOutputStreamOperator, for the result of a DataStream
>>>>>>> transformation that has a single output. There are also
>>>>>>> TemporalOperator and its derived classes StreamCrossOperator
and
>>>>>>> StreamJoinOperator. The actual operator inside a task (the ones
I
>>>>>>> mentioned before that are responsible for the user logic) that
>>>>>>> executes a temporal join is called CoStreamWindow (with a
>>>>>>> JoinWindowFunction).
>>>>>>>
>>>>>>> As I currently have it in my PR, there are two Task classes,
one for
>>>>>>> single input, and one for two-input operators. There are also
the
>>>>>>> corresponding operator interfaces for unary and binary operators
(see
>>>>>>> what I did there ... :D).
>>>>>>>
>>>>>>> What should we call all these classes (concepts). Also I'm heavily
in
>>>>>>> favour of dropping all the Stream (or Streaming) prefixes and
>>> suffixes
>>>>>>> from the class names. I know I'm in streaming because the package
is
>>>>>>> named streaming. And we should not restrain ourselves because
the
>>>>>>> batch API also has things called operator.
>>>>>>>
>>>>>>> Also, the concept of one-input, two-input tasks and operators
is not
>>>>>>> very scalable, Maybe we should have a single interface for operators
>>>>>>> that has a receiveElement(int, element) method that tells the
>>> operator
>>>>>>> from which input an element came. Then we can scale this to n-ary
>>>>>>> operators. This would of course have the overhead of always sending
>>>>>>> along the number of the input instead of encoding the input number
in
>>>>>>> the method name, such as receiveElement1() and receiveElement2().
>>>>>>>
>>>>>>> Any thoughts? :D (I know I'm writing the long annoying emails
today
>>>>>>> but I think it is important we discuss these things before being
>>> stuck
>>>>>>> with them.)
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Aljoscha
>>>>>>>
>>>>>
>>>
> 


Mime
View raw message