cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (CLOUDSTACK-7415) Host remains in Alert after vCenter restart
Date Mon, 25 Aug 2014 08:11:58 GMT


ASF subversion and git services commented on CLOUDSTACK-7415:

Commit 8ce6eba549bcd3fa007aaf10a29c3a2fef9ffaaa in cloudstack's branch refs/heads/master from
[;h=8ce6eba ]

CLOUDSTACK-7415. Host remains in Alert after vCenter restart.
Management server PingTask should update PingMap entry for an agent only if it is already
present in the Management Server's PingMap.

> Host remains in Alert after vCenter restart
> -------------------------------------------
>                 Key: CLOUDSTACK-7415
>                 URL:
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server
>    Affects Versions: 4.0.0
>            Reporter: Likitha Shetty
>            Assignee: Likitha Shetty
>            Priority: Critical
>             Fix For: 4.5.0
> In a clustered management server environment, after a vCenter restart some hosts repeatedly
go back into alert state even after the vCenter comes up.
> Root caused the issue to the below race condition - 
> There is a scheduled PingTask that is run for every host and the interval at which it
is run is configurable (global config - ping.interval). When vCenter gets restarted, PingTask
is unable to get the host status and so it schedules another task to handle the disconnect
for the host agent.
> This disconnect task determines the host status by sending CheckHeathCommand to the agent.
When the command returns an answer that says the resource is not alive, CS performs further
investigations and in this case VMware investigator confirms the host to be in disconnected
state. After which disconnect is processed which involves the following - 
> 1. Cancel all scheduled tasks for that agent which includes PingTask
> 2. Send disconnect to all listeners including AgentMonitor which clears the agent from
MS's PingMap
> If the above disconnect takes a while to get scheduled and spills over to the next PingTask
interval, then the next PingTask runs wherein if by now the vCenter is Up and host is connected
the Ping is successful and hence an entry for the agent is made in the PingMap.
> Once an entry is made in the PingMap after a disconnect, every minute the AgentMonitor
task will run to find the agent behind on Ping, disconnect host agent without investigation
because the attache is no longer connected and put the host back into Alert state.

This message was sent by Atlassian JIRA

View raw message