Mailing-List: contact dev-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.apache.org
Received-SPF: pass (athena.apache.org: message received from 54.164.171.186
 which is an MX secondary for dev@flink.apache.org)
MIME-Version: 1.0
Sender: ewenstephan@gmail.com
In-Reply-To: 
 <CA+faj9yiZwY_bkuA_oji8vafO4+c1pQb7apjizpB0eqadHfk9w@mail.gmail.com>
References: 
 <CANC1h_s2h2w-pX1DKETjKYTb53YoAhFQnZKRwhGvb0g6+POJ9w@mail.gmail.com>
	<C4DA248D-CA0F-4F8D-972B-3104A05FACC4@kth.se>
	<CANC1h_trbeX4Gh8N7BjdzF=eaVpVX3anwTXh-TWmZKxiEHQC=g@mail.gmail.com>
	<CA+faj9yiZwY_bkuA_oji8vafO4+c1pQb7apjizpB0eqadHfk9w@mail.gmail.com>
Date: Thu, 30 Apr 2015 22:17:40 +0200
Message-ID: 
 <CANC1h_u4W9JyrLfpiV7j21cDDy-_UgOHNQqSjKRc+Kc6R31AeQ@mail.gmail.com>
Subject: Re: Making state in streaming more explicit
From: Stephan Ewen <sewen@apache.org>
To: "dev@flink.apache.org" <dev@flink.apache.org>
Content-Type: multipart/alternative; boundary=089e0112ccf29674de0514f6c9a4

--089e0112ccf29674de0514f6c9a4
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I think your assumption (and the current kafka source implementation) is
that there is one state object that you update/mutate all the time.

If you draw a snapshot state object at the time of checkpoint, the source
can continue and that particular offset is remembered as the state of this
checkpoint
and can be committed to kafka/zookeeper later.

On Thu, Apr 30, 2015 at 10:09 PM, Gyula F=C3=B3ra <gyula.fora@gmail.com> wr=
ote:

> Regarding the commits (for instance kafka offset):
>
> I dont exactly get how you mean to do this, if the source continues
> processing after the checkpoint and before the commit, it will not know
> what state has been committed exactly, so it would need to know the time =
of
> checkpoint and store a local copy.
>
> Gyula
>
>
> On Thu, Apr 30, 2015 at 10:04 PM, Stephan Ewen <sewen@apache.org> wrote:
>
> > Thanks for the comments!
> >
> > Concerning acknowledging the checkpoint:
> >
> >    The sinks need to definitely acknowledge it.
> >    If we asynchronously write the state of operator (and emit downstrea=
m
> > barriers before that is complete),
> >    then I think that we also need those operators to acknowledge the
> > checkpoint.
> >
> >
> > For the commit messages:
> >
> >    My first thought was to send commit messages simply as actor message=
s
> > from the JobManager
> >    to the vertices that require these messages. That way, they are not
> > stuck in the data flow with its possible latency.
> >    Also, in the data flow, messages get duplicated (at all to all
> > connections).
> >
> >
> > For iterative flows:
> >
> > Does the JobManager need to be aware of this, or can the IterationHead
> > handle that transparently for the JobManager.
> > From our last conversation, I recall:
> >  - Receive barriers, push out barriers
> >  - snapshot its state
> >  - wait for the barriers to come back through the backchannel
> >  - write the state snapshot plus the backchannel buffers
> >  - then only acknowledge the checkpoint
> >
> > My first impression is that this way the JobManager would not handle th=
e
> > IterationHead any different from all other stateful operators.
> >
> > Greetings,
> > Stephan
> >
> >
> >
> > On Thu, Apr 30, 2015 at 9:27 PM, Paris Carbone <parisc@kth.se> wrote:
> >
> > > I agree with all suggestions, thanks for summing it up Stephan.
> > >
> > > A few more points I have in mind at the moment:
> > >
> > > - Regarding the acknowledgements, indeed we don=E2=80=99t need to mak=
e all
> > > operators commit back, we just have to make sure that all sinks have
> > > acknowledged a checkpoint to consider it complete back at the
> > coordinator.
> > >
> > > - Do you think we should broadcast commit responses to sources that
> need
> > > it after every successful checkpoint? The checkpoint interval does no=
t
> > > always match with the frequency we want to initiate a compaction for
> > > example on Kafka. One alternative would be to make sources request a
> > > successful checkpoint id via a future on demand.
> > >
> > > - We have to update the current checkpointing approach to cover
> iterative
> > > streams. We need to make sure we don=E2=80=99t send checkpoint reques=
ts to
> > > iteration heads and handle downstream backup for records in transit
> > during
> > > checkpoints accordingly.
> > >
> > > cheers
> > > Paris
> > >
> > > > On 30 Apr 2015, at 20:47, Stephan Ewen <sewen@apache.org> wrote:
> > > >
> > > > I was looking into the handling of state in streaming operators, an=
d
> it
> > > is
> > > > a bit hidden from the system
> > > >
> > > > Right now, functions can (of they want) put some state into their
> > > context.
> > > > At runtime, state may occur or not. Before runtime, the system cann=
ot
> > > tell
> > > > which operators are going to be stateful, and which are going to be
> > > > stateless.
> > > >
> > > > I think it is a good idea to expose that. We can use that for
> > > optimizations
> > > > and we know which operators need to checkpoint state and acknowledg=
e
> > the
> > > > asynchronous checkpoint.
> > > >
> > > > At this point, we need to assume that all operators need to send a
> > > > confirmation message, which is unnecessary.
> > > >
> > > > Also, I think we should expose which operations want a "commit"
> > > > notification after the checkpoint completed. Good examples are
> > > >
> > > >  - the KafkaConsumer source, which can then commit the offset that =
is
> > > safe
> > > > to zookeeper
> > > >
> > > >  - a transactional KafkaProduce sink, which can commit a batch of
> > > messages
> > > > to the kafka partition once the checkpoint is done (to get exactly
> once
> > > > guarantees that include the sink)
> > > >
> > > > Comments welcome!
> > > >
> > > > Greetings,
> > > > Stephan
> > >
> > >
> >
>

--089e0112ccf29674de0514f6c9a4--