flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8900) YARN FinalStatus always shows as KILLED with Flip-6
Date Wed, 21 Mar 2018 21:30:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408607#comment-16408607
] 

ASF GitHub Bot commented on FLINK-8900:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/5741

    [FLINK-8900] [yarn] Properly unregister application from Yarn RM

    ## What is the purpose of the change
    
    Unregisters the Flink application from Yarn if the application is shut down. This is required
to properly show the state and final status in the Yarn web UI.
    
    ## Verifying this change
    
    - Manually tested
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing,
Yarn/Mesos, ZooKeeper: (yes)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (not applicable)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink yarnApplicationStatus

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5741.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5741
    
----
commit 71efa75973d066268bbb3533f29da05270ef24b2
Author: Till Rohrmann <trohrmann@...>
Date:   2018-03-21T20:48:19Z

    [hotfix] Log final status and exit code under lock

commit c10e100cbf09e602415ff72043b857a1e29daf66
Author: Till Rohrmann <trohrmann@...>
Date:   2018-03-21T21:14:58Z

    [hotfix] Add FutureUtils#composeAfterwards

commit 2072210eddbb13add2b3228fd08c8550075cdfc1
Author: Till Rohrmann <trohrmann@...>
Date:   2018-03-21T21:19:28Z

    [FLINK-8900] [yarn] Properly unregister application from Yarn RM

----


> YARN FinalStatus always shows as KILLED with Flip-6
> ---------------------------------------------------
>
>                 Key: FLINK-8900
>                 URL: https://issues.apache.org/jira/browse/FLINK-8900
>             Project: Flink
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Nico Kruber
>            Priority: Blocker
>              Labels: flip-6
>             Fix For: 1.5.0
>
>
> Whenever I run a simple simple word count like this one on YARN with Flip-6 enabled,
> {code}
> ./bin/flink run -m yarn-cluster -yjm 768 -ytm 3072 -ys 2 -p 20 -c org.apache.flink.streaming.examples.wordcount.WordCount
./examples/streaming/WordCount.jar --input /usr/share/doc/rsync-3.0.6/COPYING
> {code}
> it will show up as {{KILLED}} in the {{State}} and {{FinalStatus}} columns even though
the program ran successfully like this one (irrespective of FLINK-8899 occurring or not):
> {code}
> 2018-03-08 16:48:39,049 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
       - Job Streaming WordCount (11a794d2f5dc2955d8015625ec300c20) switched from state RUNNING
to FINISHED.
> 2018-03-08 16:48:39,050 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator
    - Stopping checkpoint coordinator for job 11a794d2f5dc2955d8015625ec300c20
> 2018-03-08 16:48:39,050 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore
 - Shutting down
> 2018-03-08 16:48:39,078 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher
     - Job 11a794d2f5dc2955d8015625ec300c20 reached globally terminal state FINISHED.
> 2018-03-08 16:48:39,151 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager
 - Register TaskManager e58efd886429e8f080815ea74ddfa734 at the SlotManager.
> 2018-03-08 16:48:39,221 INFO  org.apache.flink.runtime.jobmaster.JobMaster          
       - Stopping the JobMaster for job Streaming WordCount(11a794d2f5dc2955d8015625ec300c20).
> 2018-03-08 16:48:39,270 INFO  org.apache.flink.runtime.jobmaster.JobMaster          
       - Close ResourceManager connection 43f725adaee14987d3ff99380701f52f: JobManager is
shutting down..
> 2018-03-08 16:48:39,270 INFO  org.apache.flink.yarn.YarnResourceManager             
       - Disconnect job manager 00000000000000000000000000000000@akka.tcp://flink@ip-172-31-7-0.eu-west-1.compute.internal:34281/user/jobmanager_0
for job 11a794d2f5dc2955d8015625ec300c20 from the resource manager.
> 2018-03-08 16:48:39,349 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPool  
       - Suspending SlotPool.
> 2018-03-08 16:48:39,349 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPool  
       - Stopping SlotPool.
> 2018-03-08 16:48:39,349 INFO  org.apache.flink.runtime.jobmaster.JobManagerRunner   
       - JobManagerRunner already shutdown.
> 2018-03-08 16:48:39,775 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager
 - Register TaskManager 4e1fb6c8f95685e24b6a4cb4b71ffb92 at the SlotManager.
> 2018-03-08 16:48:39,846 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager
 - Register TaskManager b5bce0bdfa7fbb0f4a0905cc3ee1c233 at the SlotManager.
> 2018-03-08 16:48:39,876 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint 
       - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
> 2018-03-08 16:48:39,910 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager
 - Register TaskManager a35b0690fdc6ec38bbcbe18a965000fd at the SlotManager.
> 2018-03-08 16:48:39,942 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager
 - Register TaskManager 5175cabe428bea19230ac056ff2a17bb at the SlotManager.
> 2018-03-08 16:48:39,974 INFO  org.apache.flink.runtime.blob.BlobServer              
       - Stopped BLOB server at 0.0.0.0:46511
> 2018-03-08 16:48:39,975 INFO  org.apache.flink.runtime.blob.TransientBlobCache      
       - Shutting down BLOB cache
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message