ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Lysnichenko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-4324) Server should rely on command reports when considering tasks timed out
Date Wed, 19 Feb 2014 21:02:19 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dmitry Lysnichenko updated AMBARI-4324:
---------------------------------------

    Description: 
h1. AMBARI-4323

-As of now, task timeout at server and timeout at agent are two different mechanisms, that
currently work independently and duplicate each other. 

Such behaviour leads to strange scenario:
- cluster installation is started
- execution of some command exceeds timeout
- server considers this command and *all next* commands in request timed out. This state is
shown at UI as well.
- at the same time, agent considers currently executed command timed out an kills it. After
that, agent starts executing the next command in queue. If next commands does not fail, agent
sends COMPLETE status reports.
- server receives  COMPLETE status reports and updates component status.
- if user clicks "Retry installation", only tasks for not installed components are created.
- as a result, UI shows less tasks than user expects

Changes in scope of this jira:
add TIMEDOUT command status report type at agent. At the server side, HostRoleStatus enum
already has this status type. Modify server behaviour: server considers a task timed out when
it receives appropriate command report from the agent. In this case, all task time tracking
logic is consolidated at agent. Doing that will simplify timeout handling for CustomCommands
and CustomActions.

Some issues may occur when agent host goes down and therefore does not send any command reports.
Server should have some handling for such case .-



  was:
h1. AMBARI-4323

As of now, task timeout at server and timeout at agent are two different mechanisms, that
currently work independently and duplicate each other. 

Such behaviour leads to strange scenario:
- cluster installation is started
- execution of some command exceeds timeout
- server considers this command and *all next* commands in request timed out. This state is
shown at UI as well.
- at the same time, agent considers currently executed command timed out an kills it. After
that, agent starts executing the next command in queue. If next commands does not fail, agent
sends COMPLETE status reports.
- server receives  COMPLETE status reports and updates component status.
- if user clicks "Retry installation", only tasks for not installed components are created.
- as a result, UI shows less tasks than user expects

Changes in scope of this jira:
add TIMEDOUT command status report type at agent. At the server side, HostRoleStatus enum
already has this status type. Modify server behaviour: server considers a task timed out when
it receives appropriate command report from the agent. In this case, all task time tracking
logic is consolidated at agent. Doing that will simplify timeout handling for CustomCommands
and CustomActions.

Some issues may occur when agent host goes down and therefore does not send any command reports.
Server should have some handling for such case .




> Server should rely on command reports when considering tasks timed out
> ----------------------------------------------------------------------
>
>                 Key: AMBARI-4324
>                 URL: https://issues.apache.org/jira/browse/AMBARI-4324
>             Project: Ambari
>          Issue Type: Improvement
>          Components: agent, controller
>    Affects Versions: 1.5.0
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 1.5.0
>
>
> h1. AMBARI-4323
> -As of now, task timeout at server and timeout at agent are two different mechanisms,
that currently work independently and duplicate each other. 
> Such behaviour leads to strange scenario:
> - cluster installation is started
> - execution of some command exceeds timeout
> - server considers this command and *all next* commands in request timed out. This state
is shown at UI as well.
> - at the same time, agent considers currently executed command timed out an kills it.
After that, agent starts executing the next command in queue. If next commands does not fail,
agent sends COMPLETE status reports.
> - server receives  COMPLETE status reports and updates component status.
> - if user clicks "Retry installation", only tasks for not installed components are created.
> - as a result, UI shows less tasks than user expects
> Changes in scope of this jira:
> add TIMEDOUT command status report type at agent. At the server side, HostRoleStatus
enum already has this status type. Modify server behaviour: server considers a task timed
out when it receives appropriate command report from the agent. In this case, all task time
tracking logic is consolidated at agent. Doing that will simplify timeout handling for CustomCommands
and CustomActions.
> Some issues may occur when agent host goes down and therefore does not send any command
reports. Server should have some handling for such case .-



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message