flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh <jof...@gmail.com>
Subject Exception when restoring state from RocksDB - how to recover?
Date Tue, 11 Oct 2016 10:40:02 GMT
Hi all,


I just have a couple of questions about checkpointing and restoring
state from RocksDB.


1) In some cases, I find that it is impossible to restore a job from a
checkpoint, due to an exception such as the one pasted below[*]. In
this case, it appears that the last checkpoint is somehow corrupt.
Does anyone know why this might happen?


2) When the above happens, I have no choice but to cancel the job, as
it repeatedly attempts to restart and keeps getting the same
exception. Given that no savepoint was taken recently, is it possible
for me to restore the job from an older checkpoint (e.g. the
second-last checkpoint)?


The version of Flink I'm using Flink-1.1-SNAPSHOT, from mid-June.


Thanks,

Josh


[*]The exception when restoring state:

java.lang.Exception: Could not restore checkpointed state to operators
and functions
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:480)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:219)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:588)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Error while restoring RocksDB
state from /mnt/yarn/usercache/hadoop/appcache/application_1476181294189_0001/flink-io-09ad1cb1-8dff-4f9a-9f61-6cae27ee6f1d/d236820a793043bd63360df6f175cae9/StreamFlatMap_9_8/dummy_state/dc5beab1-68fb-48b3-b3d6-272497d15a09/chk-1
	at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.restoreFromSemiAsyncSnapshot(RocksDBStateBackend.java:537)
	at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.injectKeyValueStateSnapshots(RocksDBStateBackend.java:489)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.restoreState(AbstractStreamOperator.java:204)
	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:154)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:472)
	... 3 more
Caused by: org.rocksdb.RocksDBException: NotFound: Backup not found
	at org.rocksdb.BackupEngine.restoreDbFromLatestBackup(Native Method)
	at org.rocksdb.BackupEngine.restoreDbFromLatestBackup(BackupEngine.java:177)
	at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.restoreFromSemiAsyncSnapshot(RocksDBStateBackend.java:535)
	... 7 more

Mime
View raw message