flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4150) Problem with Blobstore in Yarn HA setting on recovery after cluster shutdown
Date Wed, 20 Jul 2016 11:43:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385713#comment-15385713

ASF GitHub Bot commented on FLINK-4150:

Github user tillrohrmann commented on a diff in the pull request:

    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java
    @@ -77,7 +77,7 @@ public BlobLibraryCacheManager(BlobService blobService, long cleanupInterval)
     		// Initializing the clean up task
     		this.cleanupTimer = new Timer(true);
    -		this.cleanupTimer.schedule(this, cleanupInterval);
    +		this.cleanupTimer.schedule(this, cleanupInterval, cleanupInterval);
    --- End diff --
    Good catch :+1:

> Problem with Blobstore in Yarn HA setting on recovery after cluster shutdown
> ----------------------------------------------------------------------------
>                 Key: FLINK-4150
>                 URL: https://issues.apache.org/jira/browse/FLINK-4150
>             Project: Flink
>          Issue Type: Bug
>          Components: Job-Submission
>            Reporter: Stefan Richter
>            Assignee: Ufuk Celebi
>            Priority: Blocker
>             Fix For: 1.1.0
> Submitting a job in Yarn with HA can lead to the following exception:
> {code}
> org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot load user class:
> ClassLoader info: URL ClassLoader:
>     file: '/tmp/blobStore-ccec0f4a-3e07-455f-945b-4fcd08f5bac1/cache/blob_7fafffe9595cd06aff213b81b5da7b1682e1d6b0'
(invalid JAR: zip file is empty)
> Class not resolvable through given classloader.
> 	at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:207)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:222)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:588)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> Some job information, including the Blob ids, are stored in Zookeeper. The actual Blobs
are stored in a dedicated BlobStore, if the recovery mode is set to Zookeeper. This BlobStore
is typically located in a FS like HDFS. When the cluster is shut down, the path for the BlobStore
is deleted. When the cluster is then restarted, recovering jobs cannot restore because it's
Blob ids stored in Zookeeper now point to deleted files.

This message was sent by Atlassian JIRA

View raw message