flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sihua Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-9290) The job is unable to recover from a checkpoint
Date Wed, 02 May 2018 15:22:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461153#comment-16461153
] 

Sihua Zhou edited comment on FLINK-9290 at 5/2/18 3:21 PM:
-----------------------------------------------------------

This looks like the same problem as what have been fixed in [FLINK-9263|https://issues.apache.org/jira/browse/FLINK-9263].
It's looks like that the serializer is not threadsafe in this case. So even the checkpoint's
completed successfully, but the information of the serializer is incorrect because of the
concurrency problem, so when recovery from the checkpoint, it just failed, but I'm not so
sure about that, [~srichter] what do you think? Please correct me If I'm wrong.


was (Author: sihuazhou):
This looks like the same problem as what have been fixed in [FLINK-9263|https://issues.apache.org/jira/browse/FLINK-9263],
but I'm not so sure, [~srichter] what do you think?

> The job is unable to recover from a checkpoint
> ----------------------------------------------
>
>                 Key: FLINK-9290
>                 URL: https://issues.apache.org/jira/browse/FLINK-9290
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.4.2
>            Reporter: Narayanan Arunachalam
>            Priority: Blocker
>
> Using rocksdb state backend.
> The jobs runs fine for more than 24 hours and attempts recovery because of an error from
the sink. It continues to fail at the time recovery with the following error. The workaround
is to cancel the job and start it again.
> java.lang.IllegalStateException: Could not initialize operator state backend.
> 	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initOperatorState(AbstractStreamOperator.java:302)
> 	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:249)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:692)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:679)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException:
Index: 2, Size: 1
> Serialization trace:
> topic (org.apache.flink.streaming.connectors.kafka.internals.KafkaTopicPartition)
> 	at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
> 	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
> 	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
> 	at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:249)
> 	at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:136)
> 	at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
> 	at org.apache.flink.runtime.state.DefaultOperatorStateBackend.deserializeStateValues(DefaultOperatorStateBackend.java:584)
> 	at org.apache.flink.runtime.state.DefaultOperatorStateBackend.restore(DefaultOperatorStateBackend.java:399)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorStateBackend(StreamTask.java:733)
> 	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initOperatorState(AbstractStreamOperator.java:300)
> 	... 6 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
> 	at java.util.ArrayList.rangeCheck(ArrayList.java:657)
> 	at java.util.ArrayList.get(ArrayList.java:433)
> 	at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
> 	at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
> 	at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:728)
> 	at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message