flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Stateful Stream Processing with RocksDB causing Job failure
Date Wed, 21 Dec 2016 09:12:54 GMT
Hey Abiy!

- Do all the task managers run on a single host? Only then using the
local file system will work.

- What does every now and then mean? Every time when the job tries to
take a snapshot? After restarts?

The JobManager logs will also help if we can't figure this out like this.

Best,

Ufuk

On Tue, Dec 20, 2016 at 6:05 PM, Abiy Legesse Hailemichael
<abiybirtukan@gmail.com> wrote:
> I am running a standalone flink cluster (1.1.2) and I have a stateful
> streaming job that uses RocksDB as a state manager. I have two stateful
> operators that are using ValueState<> and ListState<>. Every now and then
my
> job fails with the following exception
>
> java.lang.Exception: Could not restore checkpointed state to operators and
> functions
> 	at
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:552)
> 	at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:250)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: File
> file:/data/flink/checkpoints/226c84df02e47d1b9c036ba894503145/StreamMap_12_5/dummy_state/chk-83
> does not exist
> 	at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
> 	at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
> 	at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
> 	at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
> 	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> 	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> 	at
> org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:88)
> 	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1975)
> 	at
> org.apache.flink.streaming.util.HDFSCopyToLocal$1.run(HDFSCopyToLocal.java:48)
>
>
> Can someone help me with this, Is this  a known issue ?
>
> Thanks
>
> Abiy Hailemichael
> Software Engineer
> Email: abiybirtukan@gmail.com
>

Mime
View raw message