flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Restart Flink in Yarn
Date Mon, 09 May 2016 08:27:29 GMT
Hey Dominique!

Are you running the job in HA mode?

– Ufuk

On Thu, May 5, 2016 at 1:49 PM, Robert Metzger <rmetzger@apache.org> wrote:
> Hi Dominic,
> I'm sorry that you ran into this issue.
> What do you mean by "flink streaming routes" ?
>
> Regarding the second question: "Now I want to restart these routes to
> continue their work from the last checkpoint. What can i do?"
> I think the feature you are looking for are savepoints:
> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html
> However, this has been added to Flink in 1.0, so its not available in your
> 0.10 release.
>
>
> I have to admit that I haven't seen the "Cannot find required BLOB at ..."
> exceptions before. Is there any chance that the files have been deleted from
> the /tmp directory by any external service (like a periodic cleanup script?)
> or has the /tmp dir been mounted to another disk in the meantime?
>
>
>
> On Wed, May 4, 2016 at 6:27 PM, Dominique Rondé
> <dominique.ronde@allsecur.de> wrote:
>>
>> Hi @all,
>>
>> i have a yarn cluster with 5 Nodes with a running flink (0.10.2) instance.
>> Today we shut down one of the Yarn-Hosts due to maintance reasons. After the
>> restart we have some flink streaming routes in a restarting status (see
>> stacktrace below). Now I want to restart these routes to continue their work
>> from the last checkpoint. What can i do?
>>
>> Greets
>> Dominique
>>
>> Stacktrace
>>
>> ===================================================================================
>>
>> java.io.IOException: Cannot get library with hash
>> 8f15fe4a8137ca2f9fb348ec634f3703f4fd7317
>> 	at
>> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerReferenceToBlobKeyAndGetURL(BlobLibraryCacheManager.java:254)
>> 	at
>> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:114)
>> 	at
>> org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:710)
>> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:471)
>> 	at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.io.IOException: Failed to fetch BLOB
>> 8f15fe4a8137ca2f9fb348ec634f3703f4fd7317 from /10.24.20.14:60485 and store
>> it under
>> /tmp/blobStore-efdeddf9-d096-440f-a4cb-9c79334ff92c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317
>> 	at org.apache.flink.runtime.blob.BlobCache.getURL(BlobCache.java:177)
>> 	at
>> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerReferenceToBlobKeyAndGetURL(BlobLibraryCacheManager.java:245)
>> 	... 4 more
>> Caused by: java.io.IOException: GET operation failed: Server side error:
>> Cannot find required BLOB at
>> /tmp/blobStore-0f9a63e3-5700-4d47-aea7-310506c1496c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317
>> 	at org.apache.flink.runtime.blob.BlobClient.get(BlobClient.java:165)
>> 	at org.apache.flink.runtime.blob.BlobCache.getURL(BlobCache.java:125)
>> 	... 5 more
>> Caused by: java.io.IOException: Server side error: Cannot find required
>> BLOB at
>> /tmp/blobStore-0f9a63e3-5700-4d47-aea7-310506c1496c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317
>> 	at
>> org.apache.flink.runtime.blob.BlobClient.receiveAndCheckResponse(BlobClient.java:213)
>> 	at org.apache.flink.runtime.blob.BlobClient.get(BlobClient.java:159)
>> 	... 6 more
>> Caused by: java.io.IOException: Cannot find required BLOB at
>> /tmp/blobStore-0f9a63e3-5700-4d47-aea7-310506c1496c/cache/blob_8f15fe4a8137ca2f9fb348ec634f3703f4fd7317
>> 	at
>> org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:202)
>> 	at
>> org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:112)
>>
>>
>

Mime
View raw message