flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks
Date Tue, 12 May 2015 09:21:04 GMT
Every vote counts. :D

On Tue, May 12, 2015 at 11:04 AM, Matthias J. Sax
<mjsax@informatik.hu-berlin.de> wrote:
> I like it. Not sure if my vote counts ;)
>
> On 05/12/2015 07:18 AM, Aljoscha Krettek wrote:
>>  My proposal for the runtime classes (per my Pull Request is this):
>>
>> StreamTask: base of streaming tasks, the task is the AbstractInvokable
>> that runs in the TaskManager and invokes stream operators
>> OneInputStreamTask and TwoOnputStreamTask and SourceStreamTask are the
>> subclasses responsible for actual types of operations.
>>
>> StreamOperator: interface for StreamOperators such as Map, Reduce and so on
>> OneInputOperator and TwoInputStreamOperator are the interface for
>> operators with one input and two inputs respectively.
>>
>> There are also AbstractStreamOperator, which provides basic
>> implementations for methods such as setup()/open()/close() and
>> AbstractUdfStreamOperator, which is derived from
>> AbstractStreamOperator. This is for operators that have user-code, it
>> deals with calling the correct functions of RichUserFunctionS
>> (open()/close()/setRuntimeContext()).
>>
>> I realised that we should probably not rename all the actual operators
>> and remove the Stream prefix and suffix, that would be to big a change
>> and orthogonal to my current PR. Other people can do it if they want.
>>
>> These are just my suggestions. Please suggest other consistent naming
>> schemes if think mine to be bad.
>>
>> On Mon, May 11, 2015 at 9:40 PM, Stephan Ewen <sewen@apache.org> wrote:
>>> How about separating the discussions about runtime class renaming (there
>>> seems to be consensus) from the
>>> API class renaming (no consensus yet).
>>>
>>> To go ahead with the runtime classes, can you make a concrete suggestion
>>> for more memorable/describing names?
>>>
>>> For the  API classes, kick off a thread, if you want, but please clearly
>>> mark in your discussion that this is about an API breaking change
>>> to a user-facing API (that is still declared beta).
>>>
>>>
>>> On Mon, May 11, 2015 at 10:18 AM, Aljoscha Krettek <aljoscha@apache.org>
>>> wrote:
>>>
>>>> Come to think of it, why do we even need SingleOutputStreamOperator?
>>>> It is just a subclass of DataStream that has almost no functionality
>>>> that couldn't be implemented in DataStream. I think it makes people
>>>> wonder why the result of a transformation is not a DataStream but this
>>>> mouthful of a class.
>>>>
>>>> And, I light of other possibilities such as MapDriver and PactDriver I
>>>> am quite happy with calling the things StreamOperator and StreamMap.
>>>> :D
>>>>
>>>> On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <balassi.marton@gmail.com>
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I am in favor of removing the Stream (or Streaming) suffixes and
>>>> prefixes.
>>>>> I think that Gyula was also referring to those.
>>>>>
>>>>> I think the naming of the Tasks, and user facing operators
>>>>> (SingleOutputStreamOperator and alike) are fine.
>>>>>
>>>>> As for the other bunch of Operators we could name them Drivers to be
>>>> mostly
>>>>> in line with the batch naming. By the way, most of the classes do not
>>>> have
>>>>> "Operator" in their name currently - e.g. the one encapsulating the map
>>>>> functionality is called StreamMap, however the base classes
>>>> (StreamOperator
>>>>> and ChainableStreamOperator) have it in their name explicitly. I could
go
>>>>> with MapDriver instead of StreamMap, ChainableStreamOperator will be
>>>>> eliminated anyway - StreamOperator needs a new name then: worst case
>>>>> scenario PactDriver. :)
>>>>>
>>>>> As for n-ary operators I agree with Gyula.
>>>>>
>>>>> On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <aljoscha@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Which name changes are you referring to? The proposed names in my
>>>>>> recent PR? Or the dropping of Stream from all the classes. For the
>>>>>> rest I was just rambling about how I don't like the names in the
batch
>>>>>> API. :D
>>>>>>
>>>>>> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gyula.fora@gmail.com>
>>>> wrote:
>>>>>>> Generally I am in favor of making these name changes. My only
concern
>>>> is
>>>>>>> regarding to the one-input and multiple inputs operators.
>>>>>>>
>>>>>>> There is a general problem with the n-ary operators regarding
type
>>>>>> safety,
>>>>>>> thats why we now have SingleInput and Co (two-input) operators.
I
>>>> think
>>>>>> we
>>>>>>> should keep these.
>>>>>>>
>>>>>>> On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <
>>>> aljoscha@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> since I'm currently reworking the Stream operators I thought
it's a
>>>>>>>> good time to talk about the naming of some classes. We have
some
>>>>>>>> legacy problems with lots of Operators, OperatorBases, TwoInput,
>>>>>>>> OneInput, Unary, Binary, etc. And maybe we can break things
in
>>>>>>>> streaming to have more consistent and future-proof naming.
>>>>>>>>
>>>>>>>> In streaming, there are:
>>>>>>>> - Tasks, these are an AbstractInvokabe and contain the main
loop of a
>>>>>>>> streaming vertex. They read from the inputs and forward data
to the
>>>>>>>> operator implementation.
>>>>>>>>
>>>>>>>> - Operators, these are invoked by a Task and are responsible
for the
>>>>>>>> actual logic of the operator. Think Map, Join, Reduce and
so on.
>>>> These
>>>>>>>> are responsible for calling the user-defined function.
>>>>>>>>
>>>>>>>> - Operators (again, I know), these are user facing classes
(some
>>>>>>>> derived from DataStream, some not). There is for example
>>>>>>>> SingleOutputStreamOperator, for the result of a DataStream
>>>>>>>> transformation that has a single output. There are also
>>>>>>>> TemporalOperator and its derived classes StreamCrossOperator
and
>>>>>>>> StreamJoinOperator. The actual operator inside a task (the
ones I
>>>>>>>> mentioned before that are responsible for the user logic)
that
>>>>>>>> executes a temporal join is called CoStreamWindow (with a
>>>>>>>> JoinWindowFunction).
>>>>>>>>
>>>>>>>> As I currently have it in my PR, there are two Task classes,
one for
>>>>>>>> single input, and one for two-input operators. There are
also the
>>>>>>>> corresponding operator interfaces for unary and binary operators
(see
>>>>>>>> what I did there ... :D).
>>>>>>>>
>>>>>>>> What should we call all these classes (concepts). Also I'm
heavily in
>>>>>>>> favour of dropping all the Stream (or Streaming) prefixes
and
>>>> suffixes
>>>>>>>> from the class names. I know I'm in streaming because the
package is
>>>>>>>> named streaming. And we should not restrain ourselves because
the
>>>>>>>> batch API also has things called operator.
>>>>>>>>
>>>>>>>> Also, the concept of one-input, two-input tasks and operators
is not
>>>>>>>> very scalable, Maybe we should have a single interface for
operators
>>>>>>>> that has a receiveElement(int, element) method that tells
the
>>>> operator
>>>>>>>> from which input an element came. Then we can scale this
to n-ary
>>>>>>>> operators. This would of course have the overhead of always
sending
>>>>>>>> along the number of the input instead of encoding the input
number in
>>>>>>>> the method name, such as receiveElement1() and receiveElement2().
>>>>>>>>
>>>>>>>> Any thoughts? :D (I know I'm writing the long annoying emails
today
>>>>>>>> but I think it is important we discuss these things before
being
>>>> stuck
>>>>>>>> with them.)
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Aljoscha
>>>>>>>>
>>>>>>
>>>>
>>
>

Mime
View raw message