hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xun Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-8876) [Submarine] After training, the monitoring program need auto close PS service
Date Mon, 15 Oct 2018 08:41:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-8876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xun Liu updated YARN-8876:
--------------------------
    Description: 
h1. Job monitor of {submarine}

The submarine needs to provide a long-term resident service that monitors each JOB mission.

This monitoring service can be processed differently according to the training tasks of different
depth learning framework types.

For example: Tensorflow performs distributed training, when the training is completed,

The PS service cannot be automatically stopped. At this time, the PS needs to be actively
stopped by the monitoring service.

  was:TensorFlow model after training, TensorFlow can not automatically turn off the PS service,
so it needs to be actively shut down by the service status monitoring program.


> [Submarine] After training, the monitoring program need auto close PS service
> -----------------------------------------------------------------------------
>
>                 Key: YARN-8876
>                 URL: https://issues.apache.org/jira/browse/YARN-8876
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Xun Liu
>            Assignee: Xun Liu
>            Priority: Major
>
> h1. Job monitor of {submarine}
> The submarine needs to provide a long-term resident service that monitors each JOB mission.
> This monitoring service can be processed differently according to the training tasks
of different depth learning framework types.
> For example: Tensorflow performs distributed training, when the training is completed,
> The PS service cannot be automatically stopped. At this time, the PS needs to be actively
stopped by the monitoring service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message