flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-9276) Improve error message when TaskManager fails
Date Mon, 07 May 2018 09:44:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465686#comment-16465686
] 

ASF GitHub Bot commented on FLINK-9276:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5954#discussion_r186375841
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPool.java
---
    @@ -1050,11 +1050,12 @@ else if (availableSlots.tryRemove(allocationID)) {
     	 * when we find some TaskManager becomes "dead" or "abnormal", and we decide to not
using slots from it anymore.
     	 *
     	 * @param resourceId The id of the TaskManager
    +	 * @param cause for the release the TaskManager
     	 */
     	@Override
    -	public CompletableFuture<Acknowledge> releaseTaskManager(final ResourceID resourceId)
{
    +	public CompletableFuture<Acknowledge> releaseTaskManager(final ResourceID resourceId,
final Exception cause) {
    --- End diff --
    
    I would use `Throwable` in the signatures. It may always be that some Error is the cause
(class not found, etc.)


> Improve error message when TaskManager fails
> --------------------------------------------
>
>                 Key: FLINK-9276
>                 URL: https://issues.apache.org/jira/browse/FLINK-9276
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.5.0
>            Reporter: Stephan Ewen
>            Assignee: vinoyang
>            Priority: Critical
>
> When a TaskManager fails, we frequently get a message
> {code}
> org.apache.flink.util.FlinkException: Releasing TaskManager container_1524853016208_0001_01_000102
> {code}
> This message is misleading in that it sounds like an intended operation, when it really
is a failure of a container that the {{ResourceManager}} reports to the {{JobManager}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message