flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jared Stehler <jared.steh...@intellifylearning.com>
Subject seeding a stream job
Date Wed, 18 Jan 2017 17:35:27 GMT
I have a use case where I need to start a stream replaying historical data, and then have it
continue processing on a live kafka source, and am looking for guidance / best practices for
implementation.

Basically, I want to start up a new “version” of the stream job, and have it process each
element from a specified range of historical data, before continuing to process records from
the live source. I'd then let the job catch up to current time and atomically switch to it
as the “live/current” job and then shut down the previously running one. The issue I’m
struggling with is that switch-over from the historical source to the live source. Is there
a way to build a composite stream source which would emit records from a bounded data set
before consuming from a kafka topic, for example? Or would I be better off stopping the job
once it’s read through the historical set, switching it’s source to the live topic and
re-starting it? Some of our jobs rely on rolling fold state, so I think I need to resume from
the save point of the historical processing.




Mime
View raw message