flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Error restoring from checkpoint on Flink 1.8
Date Wed, 24 Apr 2019 09:04:27 GMT
For future reference here is a cross link to the referred ML thread
discussion [1].

[1]
http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3Cm2ef5tpfwy.wl-ningshi2@gmail.com%3E

Cheers,
Till

On Wed, Apr 24, 2019 at 4:00 AM Ning Shi <ningshi2@gmail.com> wrote:

> Hi Congxian,
>
> I think I have figured out the issue. It's related to the checkpoint
> directory
> collision issue you responded to in the other thread. We reproduced this
> bug on
> 1.6.1 after unchaining the operators.
>
> There are two stateful operators in the chain, one is a
> CoBroadcastWithKeyedOperator, the other one is a StreamMapper. The
> CoBroadcastWithKeyedOperator creates timer states in RocksDB, the latter
> doesn’t. Because of the checkpoint directory collision bug, we always end
> up
> saving the states for CoBroadcastWithKeyedOperator.
>
> After breaking these two operators apart, they try to restore from the
> same set
> of saved states. When the StreamMapper opens the RocksDB files, it doesn’t
> care
> about any of the column families in there, including the timer states.
> Hence the
> error.
>
> --
> Ning
>

Mime
View raw message