flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Kloudas <k.klou...@data-artisans.com>
Subject Re: [DISCUSS] Adding a dispose() method in the RichFunction.
Date Thu, 10 Nov 2016 11:17:16 GMT
I suppose that given that in the DataSet API there is no semantic distinction between 
the two, we just have to make sure that whenever close() is called, the new dispose() 
is called as well.


> On Nov 10, 2016, at 11:47 AM, Fabian Hueske <fhueske@gmail.com> wrote:
> RichFunctions are used in the DataStream and DataSet APIs.
> How would that change affect the DataSet API?
> Best, Fabian
> 2016-11-10 11:37 GMT+01:00 Kostas Kloudas <k.kloudas@data-artisans.com>:
>> Hello,
>> I would like to propose the addition of a dispose() method, in addition to
>> the
>> already existing close(), in the RichFunction interface. This will align
>> the lifecycle
>> of a RichFunction, with that of an Operator. After this, the code paths
>> followed
>> when finishing successfully and when cancelling, will be totally distinct.
>> Semantically, close() will be responsible for guaranteeing semantic
>> correctness
>> of the job when the job terminates successfully, while dispose() will be
>> responsible
>> for taking care of system clean up both when terminating gracefully and
>> when
>> cancelling, e.g. freeing resources like db connections.
>> Currently, most functions use close() with the semantics of dispose() as
>> the only
>> thing they do is freeing up resources. A nice example where this leads to
>> confusion
>> is the case of the BucketingSink/RollingSink where at close(), data that
>> is not
>> committed is marked as "pending" (semantic correctness of the job). In
>> this case,
>> and given that there is no distinction between close() and dispose(), this
>> method is
>> called by the AbstractUdfStreamOperator both when successfully finishing a
>> job and
>> when something went wrong during execution. This is essentially a
>> compromise,
>> as the close() should mark this data as "committed" when successfully
>> terminating,
>> but it cannot as it can also be called when an exception was thrown.
>> Let me know what you think,
>> Kostas

View raw message