flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyula.f...@gmail.com>
Subject Re: [DISCUSS] Streaming Sources (again)
Date Sun, 31 May 2015 12:52:12 GMT
Alright, let's do the locking then :)

Let's keep only one interface for the release.

On Sun, May 31, 2015 at 12:58 PM, Márton Balassi <balassi.marton@gmail.com>
wrote:

> I am also for having only one source interface. It seems that
> interruptability is to much of a burden on the sources, locking version
> should be still acceptable from the user point of view. We are dealing with
> inherently concurrent tasks, I suppose our users are familiar with locking
> - especially the ones in need for exactly once processing.
>
> On Sat, May 30, 2015 at 2:44 AM, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
>
> > I would also prefer having only one source. The PR still has both
> > variants so that people can check them out.
> >
> > In my opinion the assumptions about interruptibility are easier to
> > break than the requirement of locking. Even if we get the kafka source
> > to work with the interruptions (which I doubt, because this fails
> > somewhere in their code) this would not guarantee that this will
> > always work in future versions. With the locking you either have the
> > locking, then it is correct (even for feature versions) or you don't,
> > then it is immediately incorrect.
> >
> > On Fri, May 29, 2015 at 10:56 PM, Gyula Fóra <gyula.fora@gmail.com>
> wrote:
> > > Hey,
> > >
> > > It seems like both interfaces are pretty much capable of doing the same
> > > thing but work on slightly different assumptions.
> > >
> > > Isn't there a way that the kafka source can work with the
> interruptions?
> > I
> > > think the reachedEnd/next interface is slightly easier to grasp than
> the
> > > run() with the locks. But in any case I would slightly prefer having
> only
> > > one of them if they can technically do the same thing.
> > >
> > > Also adding a new interface means we add a new streamtask
> implementation
> > > which is also getting slightly too much.
> > >
> > > What is you opinion on this?
> > >
> > > Gyula
> > >
> > >
> > >
> > > On Fri, May 29, 2015 at 6:51 PM, Aljoscha Krettek <aljoscha@apache.org
> >
> > > wrote:
> > >
> > >> Hi All,
> > >> after finishing my pull request that should fix the problems with the
> > >> synchronisation of checkpoints and element emission (the reason for
> > >> the faulty results of the exactly-once tests) I discovered that the
> > >> Kafka source does not deal well with being interrupted. We recently
> > >> changed the SourceFunction to the reachedEnd()/next() interface, with
> > >> the contract that the source must be interruptible to be able to
> > >> perform checkpoints. Now this doesn't seem to work with Kafka. I added
> > >> another Source interface in my PR
> > >> (https://github.com/apache/flink/pull/742). This is similar to the
> old
> > >> interface of run()/cancel(), with the addition that the source must
> > >> acquire a lock before updating state and emitting elements. The update
> > >> of state and the emission of elements must happen in the same
> > >> synchronized block to ensure consistency. This seems to solve the
> > >> problem but now we have two source interfaces.
> > >>
> > >> The question is now. What do you think about the two interfaces?
> > >> Should we keep both? Remove one?
> > >>
> > >> Cheers,
> > >> Aljoscha
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message