flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Kloudas <k.klou...@data-artisans.com>
Subject [DISCUSS] Adding a dispose() method in the RichFunction.
Date Thu, 10 Nov 2016 10:37:36 GMT

I would like to propose the addition of a dispose() method, in addition to the 
already existing close(), in the RichFunction interface. This will align the lifecycle 
of a RichFunction, with that of an Operator. After this, the code paths followed 
when finishing successfully and when cancelling, will be totally distinct.

Semantically, close() will be responsible for guaranteeing semantic correctness 
of the job when the job terminates successfully, while dispose() will be responsible 
for taking care of system clean up both when terminating gracefully and when 
cancelling, e.g. freeing resources like db connections. 

Currently, most functions use close() with the semantics of dispose() as the only 
thing they do is freeing up resources. A nice example where this leads to confusion 
is the case of the BucketingSink/RollingSink where at close(), data that is not
committed is marked as "pending" (semantic correctness of the job). In this case, 
and given that there is no distinction between close() and dispose(), this method is 
called by the AbstractUdfStreamOperator both when successfully finishing a job and 
when something went wrong during execution. This is essentially a compromise, 
as the close() should mark this data as "committed" when successfully terminating, 
but it cannot as it can also be called when an exception was thrown.

Let me know what you think,
View raw message