cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From resmo <...@git.apache.org>
Subject [GitHub] cloudstack pull request: CLOUDSTACK-8848: extra state to handle; n...
Date Thu, 24 Sep 2015 10:18:25 GMT
Github user resmo commented on the pull request:

    https://github.com/apache/cloudstack/pull/829#issuecomment-142882541
  
    @anshul1886 @koushik-das 
    @DaanHoogland  and I  had a debug session last friday, and since he is off for the next
couple of days I can give you more details about we analysed. 
    
    The powerReportMissing is not the problem, it is only the trigger. The graceful period
is the problem. The calculation of this period is relaying (see https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L114)
on the field `update_time` in table `vm_instance`. But if I look at the value  it seems it
doesn't get updated. So the grace period has most likely always passed. 
    
    I tried to do a workaround doing the following, I ran an update sql for every 5 seconds
which updated the `update_time` for my router r-342 which I was migrating around esx cluster
nodes:
    ~~~
     mysql -e 'update cloud.vm_instance set update_time=NOW() where id=342;'
    ~~~
    And the router didn't get rebooted:
    ~~~
    2015-09-24 11:47:07,685 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-218:ctx-5849bd19)
VM state report. host: 25, vm id: 342, power state: PowerOn
    2015-09-24 11:47:07,696 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-218:ctx-5849bd19)
VM state report is updated. host: 25, vm id: 342, power state: PowerOn
    2015-09-24 11:48:06,462 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-55:ctx-84cd4323)
VM state report. host: 19, vm id: 342, power state: PowerOn
    2015-09-24 11:48:06,471 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-55:ctx-84cd4323)
VM state report is updated. host: 19, vm id: 342, power state: PowerOn
    2015-09-24 11:48:06,493 WARN  [o.a.c.alerts] (DirectAgentCronJob-55:ctx-84cd4323)  alertType::
9 // dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: Router has been migrated
out of band: r-342-VM
    2015-09-24 11:49:06,539 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-29:ctx-2a57d676)
Detected missing VM. host: 19, vm id: 342, power state: PowerReportMissing, last state update:
1443095344000
    2015-09-24 11:49:06,539 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-29:ctx-2a57d676)
vm id: 342 - time since last state update(-7197461ms) has not passed graceful period yet
    2015-09-24 11:49:07,719 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-444:ctx-fdd4c055)
VM state report. host: 20, vm id: 342, power state: PowerOn
    ~~~
    
    Which means this patch is not fix the root cause. To me the root cause is that `update_time`
is not updated or the gracePeriod calculation is wrong.
    
    Any thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message