flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ning Shi <nings...@gmail.com>
Subject Re: Missing state in RocksDB checkpoints
Date Wed, 24 Apr 2019 12:09:30 GMT
Till,

Thank you for escalating this to blocker. I agree that data loss is always a serious issue.

For reference, the workaround is to unchain the stateful operators. To make the new job be
able to recover from previous checkpoint, we also had to change the UID of the operator that
was missing state and recover with allow non-restored argument. Otherwise, it would fail with
RocksDB errors on restore.

—
Ning

> On Apr 24, 2019, at 5:02 AM, Till Rohrmann <trohrmann@apache.org> wrote:
> 
> Thanks for reporting this issue Ning. I think this is actually a blocker for the next
release and should be fixed right away. For future reference here is the issue [1].
> 
> I've also pulled in Stefan who knows these components very well.
> 
> [1] https://issues.apache.org/jira/browse/FLINK-12296
> 
> Cheers,
> Till
> 
>> On Tue, Apr 23, 2019 at 5:24 PM Ning Shi <ningshi2@gmail.com> wrote:
>> On Tue, 23 Apr 2019 10:53:52 -0400,
>> Congxian Qiu wrote:
>> > Sorry for the misleading, in the previous email, I just want to say the problem
is not caused by the UUID generation, it is caused by the different operators share the same
directory(because currentlyFlink uses JobVertx as the directory)
>> 
>> Ah, thank you for the clarification, Congxian. That makes sense.
>> 
>> Ning

Mime
View raw message