flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From M Singh <mans2si...@yahoo.com>
Subject Re: Apache Flink - Difference between operator and function
Date Sun, 31 Dec 2017 21:23:06 GMT
Thanks Gordon for your explanation.  

    On Wednesday, December 20, 2017 2:16 PM, Tzu-Li (Gordon) Tai <tzulitai@apache.org>

 #yiv6533607487 body{font-family:Helvetica, Arial;font-size:13px;}Hi Mans,
What's the difference between an operator and a function ? 

An operator in Flink needs to handle processing of watermarks, records, and checkpointing
of the operator state.To implement one, you need to extend the AbstractStreamOperator base
class.It is considered a very low-level API that normal users would not use unless they have
very specific needs.To add an operator to your pipeline, you would use DataStream::transform(…).
Functions are UDFs such as a FlatMapFunction, MapFunction, WindowFunction, etc., and is the
typical way Flink users would define transformations on DataStreams / DataSets.They can be
added to your pipeline using specific transform methods for each kind of function, e.g. DataStream::flatMap(…)
corresponds to the FlatMapFunction.User functions are executed by an underlying operator (specifically,
the AbstractStreamUdfOperator).UDFs only expose the abstraction of per-record processing and
producing outputs so you don’t have to worry about other complications, for example handling
watermarks and checkpointing state.Any registered state in UDFs are managed state, and will
be checkpointed by the underlying operator.

What are the raw state interfaces ? Are they checkpoint related interfaces ?

The raw state interfaces refer to StateInitializationContext and StateSnapshotContext, which
is only visible when you directly implement an AbstractStreamOperator.Through those interfaces,
you have additional access to raw operator and keyed state input / output streams on the initializeState
and snapshotState methods, which lets you read / write state as a stream of raw bytes.
Hope this helps!
On 20 December 2017 at 10:06:34 AM, M Singh (mans2singh@yahoo.com) wrote: 

I amreading the documentation on working with state(https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/state.html)and
it states that :
All datastream functions can usemanaged state, but the raw state interfaces canonly be used
when implementing operators. Using managedstate (rather than raw state) is recommended, since
with managedstate Flink is able to automatically redistribute state when theparallelism is
changed, and also do better memorymanagement.

Iwanted to find out    
   - What's the differencebetween an operator and a function ? 
   - What are the rawstate interfaces ? Are they checkpoint related interfaces?   


View raw message