beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "suganya (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-3494) Snapshot state of aggregated data of apache beam project is not maintained in flink's checkpointing
Date Thu, 18 Jan 2018 05:56:00 GMT
suganya created BEAM-3494:
-----------------------------

             Summary: Snapshot state of aggregated data of apache beam project is not maintained
in flink's checkpointing 
                 Key: BEAM-3494
                 URL: https://issues.apache.org/jira/browse/BEAM-3494
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: suganya
            Assignee: Kenneth Knowles


We have a beam project which consumes events from kafka,does a groupby in a time window(5
mins),after window elapses it pushes the events to downstream for merge.This project is deployed
using flink ,we have enabled checkpointing to recover from failed state.

(windowsize: 5mins , checkpointingInterval: 5mins,state.backend: filesystem)

Offsets from kafka get checkpointed every 5 mins(checkpointingInterval).Before finishing the
entire DAG(groupBy and merge) , events offsets are getting checkpointed.So incase of any restart
from task-manager ,new task gets started from last successful checkpoint ,but we could'nt
able to get the aggregated snapshot data(data from groupBy task) from the persisted checkpoint.

Able to retrieve the last successful checkpointed offset from kafka ,but couldnt able to get
last aggregated data till checkpointing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message