flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xiaogang.sxg" <xiaogang....@alibaba-inc.com>
Subject Re: Some thoughts about the lower-level Flink APIs
Date Thu, 18 Aug 2016 03:00:36 GMT
I think it’s better to provide lower-level APIs. Though high-level APIs are preferred by
many users, but lower-level APIs are still needed to enhance the expressiveness and ease the

Flink APIs now provide good abstraction for many operations, but higher abstraction always
means poorer expressiveness. Blink(derived from Flink) now is heavily used in Alibaba Inc.
for real-time computing, and we come across many applications that cannot be easily be expressed
by Flink APIs.

Take a concrete example. In an advertising application, we need to perform a set of aggregations
on windows. Each record will fire a window which contains all the records arrived at prior
and the window size is determined by the record's value. Since it’s impossible to know the
windows a record belongs to, we are unable to implement the operation with the Flink window
abstractions (Assigner-Trigger-Evictor). Finally, we implement the application with Flatmap
and customized states. 

Besides the advertising application, many applications are implemented in such methods. If
lower-level Flink APIs are provided, we can facilitate the development of these applications
and well improve their performance.

I think Microsoft’s Naiad does a good job in the abstraction of lower-level APIs. Different
from those low-level APIs in Storm or S4, Naiad also provides the APIs to track the progress
(NotifyAt and OnNotify). With the knowledge of the job’s topology, we can easily track the
progress of the execution. 

Checkpoints then can be viewed as a special kind of progress tracking and can be implemented
with the two methods. In such cases, we can implement customized fault-tolerance mechanisms
which are always demanded by our users.

The ability to track the progress can also be used to optimize Machine Learning jobs. It has
been proven that many ML jobs can be optimized with bounded asynchronous iterations. By tracking
the progress of the producers, the consumers can proceed to the next iteration with sufficient
producers complete their jobs.


> 在 2016年8月18日,上午12:35,Stephan Ewen <sewen@apache.org> 写道:
> I do actually think that all levels of abstraction have their value.
> If you wish, we have (top to bottom):
> (1) SQL
> (2) Table API with simplified Stream/Table duality
> (3) DataStream API / windows
> (4) DataStream API with custom windows and triggers
> (5) Custom Operators
> The Data Scientist may not like (5), but there sure is a bunch of people
> that just want the basic fabric (streams and fault tolerant state) to
> stitch together whatever they want.
> I think it would be great to present this layering and simply say "pick
> your entry point!"
> The abstraction of (5) is actually similar to what the operator-centric
> (non-fluent) API of Kafka Streams is. It is only slightly more involved in
> Flink, because it exposes too many internals. With some simple
> wrapper/template, this can be a full-fledged API function, or a separate
> API in itself.
> Stephan
> On Wed, Aug 17, 2016 at 6:16 PM, Martin Neumann <mneumann@sics.se> wrote:
>> I agree with Vasia that for data scientist it's likely easier to learn the
>> high-level api. I like the material from
>> http://dataartisans.github.io/flink-training/ but all of them focus on the
>> high level api.
>> Maybe we could have a guide (blog post, lecture, whatever) on how to get
>> into Flink as a Storm guy. That would allow people with that background to
>> directly dive into the lower level api working with models similar to what
>> they were used to. I would volunteer but I'm not familiar with Storm.
>> I for my part, always had to use some lower level api in all of my
>> application, most of the time pestering Aljioscha about the details. So
>> either I'm the exception or there is a need for more complex examples
>> showcasing the lower level api methods.
>> One of the things I have been using in several pipelines so far is
>> extracting the start and end timestamp from a window adding it to the
>> window aggregate. Maybe something like this could be a useful example to
>> include into the training.
>> Side question:
>> I assume there are recurring design patterns in stream applications user
>> develop. Is there any chance we will be able to identify or create some
>> design patterns (similar to java design pattern). That would make it easier
>> to use the lower level api and might help people to avoid pitfalls like the
>> one Alijosha mentioned.
>> cheers Martin
>> PS: I hope its fine for me to butt into the discussion like this.
>> On Tue, Aug 16, 2016 at 4:34 PM, Wright, Eron <ewright@live.com> wrote:
>>> Jamie,
>>> I think you raise a valid concern but I would hesitate to accept the
>>> suggestion that the low-level API be promoted to app developers.
>>> Higher-level abstractions tend to be more constrained and more optimized,
>>> whereas lower-level abstractions tend to be more powerful, be more
>>> laborious to use and provide the system with less knowledge.   It is a
>>> classic tradeoff.
>>> I think it important to consider, what are the important/distinguishing
>>> characteristics of the Flink framework.    Exactly-once guarantees,
>>> event-time support, support for job upgrade without data loss, fault
>>> tolerance, etc.    I’m speculating that the high-level abstraction
>> provided
>>> to app developers is probably needed to retain those charactistics.
>>> I think Vasia makes a good point that SQL might be a good alternative way
>>> to ease into Flink.
>>> Finally, I believe the low-level API is primarily intended for extension
>>> purposes (connectors, operations, etc) not app development.    It could
>> use
>>> better documentation to ensure that third-party extensions support those
>>> key characteristics.
>>> -Eron
>>>> On Aug 16, 2016, at 3:12 AM, Vasiliki Kalavri <
>> vasilikikalavri@gmail.com>
>>> wrote:
>>>> Hi Jamie,
>>>> thanks for sharing your thoughts on this! You're raising some
>> interesting
>>>> points.
>>>> Whether users find the lower-level primitives more intuitive depends on
>>>> their background I believe. From what I've seen, if users are coming
>> from
>>>> the S4/Storm world and are used to the "compositional" way of
>> streaming,
>>>> then indeed it's easier for them to think and operate on that level.
>>> These
>>>> are usually people who have seen/built streaming things before trying
>> out
>>>> Flink.
>>>> But if we're talking about analysts and people coming from the "batch"
>>> way
>>>> of thinking or people used to working with SQL/python, then the
>>>> higher-level declarative API is probably easier to understand.
>>>> I do think that we should make the lower-level API more visible and
>>>> document it properly, but I'm not sure if we should teach Flink on this
>>>> level first. I think that presenting it as a set of "advanced" features
>>>> makes more sense actually.
>>>> Cheers,
>>>> -Vasia.
>>>> On 16 August 2016 at 04:24, Jamie Grier <jamie@data-artisans.com>
>> wrote:
>>>>> You lost me at lattice, Aljoscha ;)
>>>>> I do think something like the more powerful N-way FlatMap w/ Timers
>>>>> Aljoscha is describing here would probably solve most of the problem.
>>>>> Often Flink's higher level primitives work well for people and that's
>>>>> great.  It's just that I also spend a fair amount of time discussing
>>> with
>>>>> people how to map what they know they want to do onto operations that
>>>>> aren't a perfect fit and it sometimes liberates them when they realize
>>> they
>>>>> can just implement it the way they want by dropping down a level.
>> They
>>>>> usually don't go there themselves, though.
>>>>> I mention teaching this "first" and then the higher layers I guess
>>> because
>>>>> that's just a matter of teaching philosophy.  I think it's good to to
>>> see
>>>>> the basic operations that are available first and then understand that
>>> the
>>>>> other abstractions are built on top of that.  That way you're not
>>> afraid to
>>>>> drop-down to basics when you know what you want to get done.
>>>>> -Jamie
>>>>> On Mon, Aug 15, 2016 at 2:11 AM, Aljoscha Krettek <
>> aljoscha@apache.org>
>>>>> wrote:
>>>>>> Hi All,
>>>>>> I also thought about this recently. A good think would be to add
>> good
>>>>>> user facing operator that behaves more or less like an enhanced
>> FlatMap
>>>>>> with multiple inputs, multiple outputs, state access and keyed
>> timers.
>>>>> I'm
>>>>>> a bit hesitant, though, since users rarely think about the
>> implications
>>>>>> that come with state updating and out-of-order events. If you don't
>>>>>> implement a stateful operator correctly you have pretty much
>> arbitrary
>>>>>> results.
>>>>>> The problem with out-of-order event arrival and state update is that
>>> the
>>>>>> state basically has to monotonically transition "upwards" through
>>>>> lattice
>>>>>> for the computation to make sense. I know this sounds rather
>>> theoretical
>>>>> so
>>>>>> I'll try to explain with an example. Say you have an operator that
>>> waits
>>>>>> for timestamped elements A, B, C to arrive in timestamp order and
>> then
>>>>> does
>>>>>> some processing. The naive approach would be to have a small state
>>>>> machine
>>>>>> that tracks what element you have seen so far. The state machine
>>>>> three
>>>>>> states: NOTHING, SEEN_A, SEEN_B and SEEN_C. The state machine is
>>> supposed
>>>>>> to traverse these states linearly as the elements arrive. This
>> doesn't
>>>>>> work, however, when elements arrive in an order that does not match
>>> their
>>>>>> timestamp order. What the user should do is to have a "Set" state
>> that
>>>>>> keeps track of the elements that it has seen. Once it has seen {A,
>>> C}
>>>>>> the operator must check the timestamps and then do the processing,
>>>>>> required. The set of possible combinations of A, B, and C forms a
>>> lattice
>>>>>> when combined with the "subset" operation. And traversal through
>> these
>>>>> sets
>>>>>> is monotonically "upwards" so it works regardless of the order that
>> the
>>>>>> elements arrive in. (I recently pointed this out on the Beam mailing
>>> list
>>>>>> and Kenneth Knowles rightly pointed out that what I was describing
>> was
>>> in
>>>>>> fact a lattice.)
>>>>>> I know this is a bit off-topic but I think it's very easy for users
>> to
>>>>>> write wrong operations when they are dealing with state. We should
>>> still
>>>>>> have a good API for it, though. Just wanted to make people aware
>>> this.
>>>>>> Cheers,
>>>>>> Aljoscha
>>>>>> On Mon, 15 Aug 2016 at 08:18 Matthias J. Sax <mjsax@apache.org>
>> wrote:
>>>>>>> It really depends on the skill level of the developer. Using
>> low-level
>>>>>>> API requires to think about many details (eg. state handling
>>> that
>>>>>>> could be done wrong.
>>>>>>> As Flink gets a broader community, more people will use it who
>>>>> not
>>>>>>> have the required skill level to deal with low-level API. For
>>>>>>> trained uses, it is of course a powerful tool!
>>>>>>> I guess it boils down to the question, what type of developer
>>>>>>> targets, if low-level API should be offensive advertised or not.
>> Also
>>>>>>> keep in mind, that many people criticized Storm's low-level API
>>> hard
>>>>>>> to program etc.
>>>>>>> -Matthias
>>>>>>> On 08/15/2016 07:46 AM, Gyula Fóra wrote:
>>>>>>>> Hi Jamie,
>>>>>>>> I agree that it is often much easier to work on the lower
>> APIs
>>>>> if
>>>>>>> you
>>>>>>>> know what you are doing.
>>>>>>>> I think it would be nice to have very clean abstractions
on that
>>>>> level
>>>>>> so
>>>>>>>> we could teach this to the users first but currently I thinm
>> not
>>>>>> easy
>>>>>>>> enough to be good starting point.
>>>>>>>> The user needs to understand a lot about the system if the
>> want
>>>>> to
>>>>>>>> hurt other parts of the pipeline. For insance working with
>>>>>>>> streamrecords, propagating watermarks, working with state
>>>>>>>> This all might be overwhelming at the first glance. But maybe
>> can
>>>>>> slim
>>>>>>>> some abstractions down to the point where this becomes kind
of the
>>>>>>>> extension of the RichFunctions.
>>>>>>>> Cheers,
>>>>>>>> Gyula
>>>>>>>> On Sat, Aug 13, 2016, 17:48 Jamie Grier <jamie@data-artisans.com>
>>>>>> wrote:
>>>>>>>>> Hey all,
>>>>>>>>> I've noticed a few times now when trying to help users
>>>>>>> particular
>>>>>>>>> things in the Flink API that it can be complicated to
map what
>> they
>>>>>> know
>>>>>>>>> they are trying to do onto higher-level Flink concepts
such as
>>>>>>> windowing or
>>>>>>>>> Connect/CoFlatMap/ValueState, etc.
>>>>>>>>> At some point it just becomes easier to think about writing
>> Flink
>>>>>>>>> operator yourself that is integrated into the pipeline
with a
>>>>>>> transform()
>>>>>>>>> call.
>>>>>>>>> It can just be easier to think at a more basic level.
>> example I
>>>>>> can
>>>>>>>>> write an operator that can consume one or two input streams
>> (should
>>>>>>>>> probably be N), update state which is managed for me
>>>>> tolerantly,
>>>>>>> and
>>>>>>>>> output elements or setup timers/triggers that give me
>> from
>>>>>>> which
>>>>>>>>> I can also update state or emit elements.
>>>>>>>>> When you think at this level you realize you can program
>> about
>>>>>>>>> anything you want.  You can create whatever fault-tolerant
>>>>>>> structures
>>>>>>>>> you want, and easily execute robust stateful computation
over data
>>>>>>> streams
>>>>>>>>> at scale.  This is the real technology and power of Flink
>>>>>>>>> Also, at this level I don't have to think about the complexities
>> of
>>>>>>>>> windowing semantics, learn as much API, etc.  I can easily
>> some
>>>>>>> inputs
>>>>>>>>> that are broadcast, others that are keyed, manage my
own state in
>>>>>>> whatever
>>>>>>>>> data structure makes sense, etc.  If I know exactly what
>> actually
>>>>>>> want to
>>>>>>>>> do I can just do it with the full power of my chosen
>> data
>>>>>>>>> structures, etc.  I'm not "restricted" to trying to map
>>>>>> onto
>>>>>>>>> higher-level Flink constructs which is sometimes actually
>>>>>>> complicated.
>>>>>>>>> Programming at this level is actually fairly easy to
do but people
>>>>>> seem
>>>>>>> a
>>>>>>>>> bit afraid of this level of the API.  They think of it
>> low-level
>>>>> or
>>>>>>>>> custom hacking..
>>>>>>>>> Anyway, I guess my thought is this..  Should we explain
Flink to
>>>>>> people
>>>>>>> at
>>>>>>>>> this level *first*?  Show that you have nearly unlimited
power and
>>>>>>>>> flexibility to build what you want *and only then* from
>>>>> explain
>>>>>>> the
>>>>>>>>> higher level APIs they can use *if* those match their
use cases
>>>>> well.
>>>>>>>>> Would this better demonstrate to people the power of
Flink and
>> maybe
>>>>>>>>> *liberate* them a bit from feeling they have to map their
>>>>>> onto a
>>>>>>>>> more complex set of higher level primitives?  I see people
>> to
>>>>>>>>> shoe-horn what they are really trying to do, which is
simple to
>>>>>> explain
>>>>>>> in
>>>>>>>>> english, onto windows, triggers, CoFlatMaps, etc, and
this get's
>>>>>>>>> complicated sometimes.  It's like an impedance mismatch.
>> could
>>>>>> just
>>>>>>>>> solve the problem very easily programmed in straight
>>>>>>>>> Anyway, it's very easy to drop down a level in the API
and program
>>>>>>> whatever
>>>>>>>>> you want but users don't seem to *perceive* it that way.
>>>>>>>>> Just some thoughts...  Any feedback?  Have any of you
had similar
>>>>>>>>> experiences when working with newer Flink users or as
a newer
>> Flink
>>>>>> user
>>>>>>>>> yourself?  Can/should we do anything to make the *lower*
level API
>>>>>> more
>>>>>>>>> accessible/visible to users?
>>>>>>>>> -Jamie
>>>>> --
>>>>> Jamie Grier
>>>>> data Artisans, Director of Applications Engineering
>>>>> @jamiegrier <https://twitter.com/jamiegrier>
>>>>> jamie@data-artisans.com

View raw message