flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Federico D'Ambrosio" <federico.dambro...@smartlab.ws>
Subject Question about checkpointing with stateful operators and state recovery
Date Thu, 28 Sep 2017 09:46:11 GMT
Hi, I've got a couple of questions concerning the topics in the subject:

    1. If an operator is getting applied on a keyed stream, do I still have
to implement the CheckpointedFunction trait and define the snapshotState
and initializeState methods, in order to successfully recover the state
from a job failure?

    2. While using a FlinkKafkaConsumer, enabling checkpointing allows
exactly once semantics end to end, provided that the sink is able to
guarantee the same. Do I have to set
setCommitOffsetsOnCheckpoints(true)? How would someone implement exactly
once semantics in a sink?

    3. What are the advantages of externalized checkpoints and which are
the cases where I would want to use them?

    4. Let's suppose a scenario where: checkpointing is enabled every 10
seconds, I have a kafka consumer which is set to start from the latest
records, a sink providing at least once semantics and a stateful keyed
operator inbetween the consumer and the sink. Is it correct that, in case
of task failure, happens the following?
        - the kafka consumer gets reverted to the latest offset (does it
happen even if I don't set setCommitOffsetsOnCheckpoints(true)?)
        - the operator state gets reverted to the latest checkpoint
        - the sink is stateless so it doesn't really care about what
happened
        - the stream restarts and probably some of the events coming to the
sink have already been     processed before

Thank you for attention,
Kind regards,
Federico

Mime
View raw message