flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: state inside functions
Date Fri, 04 Aug 2017 19:33:43 GMT
Hi Peter,

function objects (such as an instance of a class that extends MapFunction)
that are used to construct a plan are serialized using Java serialization
and shipped to the workers for execution.
Therefore, function classes must be Serializable. In general it is
recommended to configure function objects via the constructor. However, if
you have a member property that does not implement Serializable, you should
use a RichFunction, make the property transient, and initialize it in
open().

Alternatively, you can also override Java's serialization/deserialization
methods and implement custom de/serialization logic.

Best, Fabian



2017-08-03 16:00 GMT+02:00 Nico Kruber <nico@data-artisans.com>:

> Hi Peter,
> there's no need to worry about transient members as the operator itself is
> not
> serialized - only the state itself, depending on the state back-end.
>
> If you want your state to be recovered by checkpoints you should implement
> the
> open() method and initialise your state there as in your point (2) and as
> described in [1].
>
> If you want to re-scale your job, you have to take a savepoint and may
> resume
> from there with a different parallelism [2] but be sure to set a maximum
> parallelism (per job / or operator) and set UUIDs for operators as
> described
> in [3].
>
>
> Nico
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.3/dev/stream/
> state.html
> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.4/setup/
> savepoints.html
> [3] https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/
> production_ready.html
>
> On Thursday, 3 August 2017 12:11:14 CEST Peter Ertl wrote:
> > Hi,
> >
> > can someone elaborate on when I should set properties transient /
> > non-transient within operators (e.g. map / flatMap / reduce) ?
> >
> > I see these two possibilies:
> >
> > (1) initialize a non-transient property from the constructor
> > (2) initialize a transient property inside a Rich???Function when
> > open(ConfigurationParameters) is invoked
> >
> > on what criteria should I choose (1) or (2) ?
> >
> > how is this related to checkpointing / rebalancing?
> >
> > Thanks in advance
> > Peter
>
>

Mime
View raw message