cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "prashant kumar mishra (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CLOUDSTACK-5055) host went in Error in maintenance mode ;unable to migrate vms
Date Wed, 06 Nov 2013 09:59:20 GMT

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

prashant kumar mishra updated CLOUDSTACK-5055:
----------------------------------------------

    Description: 
Steps to reproduce
-------------------------
-------------------------
1-preapare CS setup with kvm(rhel6.2) say host1
2-set execute.in.sequence.hypervisor.commands and execute.in.sequence.network.element.commands
to false
3-deploye 32 vms 
4-add one more host  say host 2in cluster
5-try to put host1 in maintenance mode

Expected
---------------
Host1 should go in maintenance mode 

Actual
---------
Host1 stuck in "Error In maintenance" state and few vms got migrated to host2

My observation 
---------------------
1-i tried same with 3 vms user vms and system vms  , enabling maintenance  worked properly
,
2-I saw this issue only when there are large number(32+) vms are there in a host

Logs
--------
2013-11-06 09:53:27,424 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-8:null) Seq
4-2144927817: Unable to find listener.
2013-11-06 09:53:27,426 DEBUG [vm.dao.VMInstanceDaoImpl] (HA-Worker-4:work-34) Unable to update
VM[User|f66d29c2-2cd2-4715-ae31-5e43cea707bf]: DB Data={Host=1; State=Running; updated=7;
time=Wed Nov 06 09:53:27 EST 2013} New Data: {Host=1; State=Stopping; updated=6; time=Wed
Nov 06 09:53:27 EST 2013} Stale Data: {Host=1; State=Running; updated=5; time=Wed Nov 06 09:53:25
EST 2013}
2013-11-06 09:53:27,435 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-4:work-34) Unable
to stop VM due to VM is being operated on.
2013-11-06 09:53:27,435 WARN  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-34)
Unable to migrate vm from 1
2013-11-06 09:53:27,432 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
DeploymentPlanner allocation algorithm: com.cloud.deploy.FirstFitPlanner_EnhancerByCloudStack_e995abc3@d603051
2013-11-06 09:53:27,435 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
Trying to allocate a host and storage pools from dc:1, pod:1,cluster:1, requested cpu: 200,
requested ram: 134217728
2013-11-06 09:53:27,435 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
Is ROOT volume READY (pool already allocated)?: No
2013-11-06 09:53:27,435 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
This VM has last host_id specified, trying to choose the same host: 1
2013-11-06 09:53:27,437 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
The last host of this VM is in avoid set
2013-11-06 09:53:27,437 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
Cannot choose the last host to deploy this VM
2013-11-06 09:53:27,437 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-3:work-38) Searching
resources only under specified Cluster: 1
2013-11-06 09:53:27,440 DEBUG [cloud.resource.ResourceManagerImpl] (HA-Worker-4:work-34) No
next resource state for host 1 while current state is ErrorInMaintenance with event UnableToMigrate
com.cloud.utils.fsm.NoTransitionException: No next resource state found for current state
=ErrorInMaintenance event =UnableToMigrate
        at com.cloud.resource.ResourceManagerImpl.resourceStateTransitTo(ResourceManagerImpl.java:1178)
        at com.cloud.resource.ResourceManagerImpl.maintenanceFailed(ResourceManagerImpl.java:2313)
        at com.cloud.ha.HighAvailabilityManagerImpl.migrate(HighAvailabilityManagerImpl.java:602)
        at com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:858)
2013-11-06 09:53:27,451 DEBUG [agent.transport.Request] (AgentManager-Handler-10:null) Seq
1-1113784382: Processing:  { Ans: , MgmtId: 6959054979131, via: 1, Ver: v1, Flags: 110, [{"com.cloud.agent.api.MigrateAnswer":{"result":false,"details":"Cannot
recv data: Connection reset by peer","wait":0}}] }
2013-11-06 09:53:27,451 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-10:null)
Seq 1-1113784382: No more commands found
2013-11-06 09:53:27,451 DEBUG [agent.transport.Request] (HA-Worker-0:work-35) Seq 1-1113784382:
Received:  { Ans: , MgmtId: 6959054979131, via: 1, Ver: v1, Flags: 110, { MigrateAnswer }
}
2013-11-06 09:53:27,451 ERROR [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-35) Unable
to migrate due to Cannot recv data: Connection reset by peer
2013-11-06 09:53:27,452 INFO  [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-35) Migration
was unsuccessful.  Cleaning up: VM[User|b363903f-992c-412a-ab8d-a9bb15e23a51]
2013-11-06 09:53:27,449 DEBUG [agent.transport.Request] (AgentManager-Handler-9:null) Seq
4-2144927816: Processing:  { Ans: , MgmtId: 6959054979131, via: 4, Ver: v1, Flags: 110, [{"com.cloud.agent.api.PrepareForMigrationAnswer":{"result":true,"wait":0}}]
}
2013-11-06 09:53:27,452 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-9:null) Seq
4-2144927816: No more commands found
2013-11-06 09:53:27,452 DEBUG [agent.transport.Request] (HA-Worker-2:work-36) Seq 4-2144927816:
Received:  { Ans: , MgmtId: 6959054979131, via: 4, Ver: v1, Flags: 110, { PrepareForMigrationAnswer
} }
2013-11-06 09:53:27,458 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-34)
Completed HAWork[34-Migration-27-Running-Migrating]





  was:
Steps to reproduce
-------------------------
-------------------------
1-preapare CS setup with kvm(rhel6.2) say host1
2-set execute.in.sequence.hypervisor.commands and execute.in.sequence.network.element.commands
to false
3-deploye 32 vms 
4-add one more host  say host 2in cluster
5-try to put host1 in maintenance mode

Expected
---------------
Host1 should go in maintenance mode 

Actual
---------
Host1 stuck in "Error In maintenance" state and few vms got migrated to host2

Logs
--------
2013-11-06 09:53:27,424 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-8:null) Seq
4-2144927817: Unable to find listener.
2013-11-06 09:53:27,426 DEBUG [vm.dao.VMInstanceDaoImpl] (HA-Worker-4:work-34) Unable to update
VM[User|f66d29c2-2cd2-4715-ae31-5e43cea707bf]: DB Data={Host=1; State=Running; updated=7;
time=Wed Nov 06 09:53:27 EST 2013} New Data: {Host=1; State=Stopping; updated=6; time=Wed
Nov 06 09:53:27 EST 2013} Stale Data: {Host=1; State=Running; updated=5; time=Wed Nov 06 09:53:25
EST 2013}
2013-11-06 09:53:27,435 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-4:work-34) Unable
to stop VM due to VM is being operated on.
2013-11-06 09:53:27,435 WARN  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-34)
Unable to migrate vm from 1
2013-11-06 09:53:27,432 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
DeploymentPlanner allocation algorithm: com.cloud.deploy.FirstFitPlanner_EnhancerByCloudStack_e995abc3@d603051
2013-11-06 09:53:27,435 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
Trying to allocate a host and storage pools from dc:1, pod:1,cluster:1, requested cpu: 200,
requested ram: 134217728
2013-11-06 09:53:27,435 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
Is ROOT volume READY (pool already allocated)?: No
2013-11-06 09:53:27,435 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
This VM has last host_id specified, trying to choose the same host: 1
2013-11-06 09:53:27,437 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
The last host of this VM is in avoid set
2013-11-06 09:53:27,437 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
Cannot choose the last host to deploy this VM
2013-11-06 09:53:27,437 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-3:work-38) Searching
resources only under specified Cluster: 1
2013-11-06 09:53:27,440 DEBUG [cloud.resource.ResourceManagerImpl] (HA-Worker-4:work-34) No
next resource state for host 1 while current state is ErrorInMaintenance with event UnableToMigrate
com.cloud.utils.fsm.NoTransitionException: No next resource state found for current state
=ErrorInMaintenance event =UnableToMigrate
        at com.cloud.resource.ResourceManagerImpl.resourceStateTransitTo(ResourceManagerImpl.java:1178)
        at com.cloud.resource.ResourceManagerImpl.maintenanceFailed(ResourceManagerImpl.java:2313)
        at com.cloud.ha.HighAvailabilityManagerImpl.migrate(HighAvailabilityManagerImpl.java:602)
        at com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:858)
2013-11-06 09:53:27,451 DEBUG [agent.transport.Request] (AgentManager-Handler-10:null) Seq
1-1113784382: Processing:  { Ans: , MgmtId: 6959054979131, via: 1, Ver: v1, Flags: 110, [{"com.cloud.agent.api.MigrateAnswer":{"result":false,"details":"Cannot
recv data: Connection reset by peer","wait":0}}] }
2013-11-06 09:53:27,451 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-10:null)
Seq 1-1113784382: No more commands found
2013-11-06 09:53:27,451 DEBUG [agent.transport.Request] (HA-Worker-0:work-35) Seq 1-1113784382:
Received:  { Ans: , MgmtId: 6959054979131, via: 1, Ver: v1, Flags: 110, { MigrateAnswer }
}
2013-11-06 09:53:27,451 ERROR [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-35) Unable
to migrate due to Cannot recv data: Connection reset by peer
2013-11-06 09:53:27,452 INFO  [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-35) Migration
was unsuccessful.  Cleaning up: VM[User|b363903f-992c-412a-ab8d-a9bb15e23a51]
2013-11-06 09:53:27,449 DEBUG [agent.transport.Request] (AgentManager-Handler-9:null) Seq
4-2144927816: Processing:  { Ans: , MgmtId: 6959054979131, via: 4, Ver: v1, Flags: 110, [{"com.cloud.agent.api.PrepareForMigrationAnswer":{"result":true,"wait":0}}]
}
2013-11-06 09:53:27,452 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-9:null) Seq
4-2144927816: No more commands found
2013-11-06 09:53:27,452 DEBUG [agent.transport.Request] (HA-Worker-2:work-36) Seq 4-2144927816:
Received:  { Ans: , MgmtId: 6959054979131, via: 4, Ver: v1, Flags: 110, { PrepareForMigrationAnswer
} }
2013-11-06 09:53:27,458 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-34)
Completed HAWork[34-Migration-27-Running-Migrating]






> host went in Error in maintenance mode ;unable to migrate vms
> -------------------------------------------------------------
>
>                 Key: CLOUDSTACK-5055
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5055
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: KVM, Management Server
>    Affects Versions: 4.2.0
>            Reporter: prashant kumar mishra
>
> Steps to reproduce
> -------------------------
> -------------------------
> 1-preapare CS setup with kvm(rhel6.2) say host1
> 2-set execute.in.sequence.hypervisor.commands and execute.in.sequence.network.element.commands
to false
> 3-deploye 32 vms 
> 4-add one more host  say host 2in cluster
> 5-try to put host1 in maintenance mode
> Expected
> ---------------
> Host1 should go in maintenance mode 
> Actual
> ---------
> Host1 stuck in "Error In maintenance" state and few vms got migrated to host2
> My observation 
> ---------------------
> 1-i tried same with 3 vms user vms and system vms  , enabling maintenance  worked properly
,
> 2-I saw this issue only when there are large number(32+) vms are there in a host
> Logs
> --------
> 2013-11-06 09:53:27,424 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-8:null)
Seq 4-2144927817: Unable to find listener.
> 2013-11-06 09:53:27,426 DEBUG [vm.dao.VMInstanceDaoImpl] (HA-Worker-4:work-34) Unable
to update VM[User|f66d29c2-2cd2-4715-ae31-5e43cea707bf]: DB Data={Host=1; State=Running; updated=7;
time=Wed Nov 06 09:53:27 EST 2013} New Data: {Host=1; State=Stopping; updated=6; time=Wed
Nov 06 09:53:27 EST 2013} Stale Data: {Host=1; State=Running; updated=5; time=Wed Nov 06 09:53:25
EST 2013}
> 2013-11-06 09:53:27,435 DEBUG [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-4:work-34)
Unable to stop VM due to VM is being operated on.
> 2013-11-06 09:53:27,435 WARN  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-34)
Unable to migrate vm from 1
> 2013-11-06 09:53:27,432 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
DeploymentPlanner allocation algorithm: com.cloud.deploy.FirstFitPlanner_EnhancerByCloudStack_e995abc3@d603051
> 2013-11-06 09:53:27,435 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
Trying to allocate a host and storage pools from dc:1, pod:1,cluster:1, requested cpu: 200,
requested ram: 134217728
> 2013-11-06 09:53:27,435 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
Is ROOT volume READY (pool already allocated)?: No
> 2013-11-06 09:53:27,435 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
This VM has last host_id specified, trying to choose the same host: 1
> 2013-11-06 09:53:27,437 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
The last host of this VM is in avoid set
> 2013-11-06 09:53:27,437 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (HA-Worker-3:work-38)
Cannot choose the last host to deploy this VM
> 2013-11-06 09:53:27,437 DEBUG [cloud.deploy.FirstFitPlanner] (HA-Worker-3:work-38) Searching
resources only under specified Cluster: 1
> 2013-11-06 09:53:27,440 DEBUG [cloud.resource.ResourceManagerImpl] (HA-Worker-4:work-34)
No next resource state for host 1 while current state is ErrorInMaintenance with event UnableToMigrate
> com.cloud.utils.fsm.NoTransitionException: No next resource state found for current state
=ErrorInMaintenance event =UnableToMigrate
>         at com.cloud.resource.ResourceManagerImpl.resourceStateTransitTo(ResourceManagerImpl.java:1178)
>         at com.cloud.resource.ResourceManagerImpl.maintenanceFailed(ResourceManagerImpl.java:2313)
>         at com.cloud.ha.HighAvailabilityManagerImpl.migrate(HighAvailabilityManagerImpl.java:602)
>         at com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:858)
> 2013-11-06 09:53:27,451 DEBUG [agent.transport.Request] (AgentManager-Handler-10:null)
Seq 1-1113784382: Processing:  { Ans: , MgmtId: 6959054979131, via: 1, Ver: v1, Flags: 110,
[{"com.cloud.agent.api.MigrateAnswer":{"result":false,"details":"Cannot recv data: Connection
reset by peer","wait":0}}] }
> 2013-11-06 09:53:27,451 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-10:null)
Seq 1-1113784382: No more commands found
> 2013-11-06 09:53:27,451 DEBUG [agent.transport.Request] (HA-Worker-0:work-35) Seq 1-1113784382:
Received:  { Ans: , MgmtId: 6959054979131, via: 1, Ver: v1, Flags: 110, { MigrateAnswer }
}
> 2013-11-06 09:53:27,451 ERROR [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-35)
Unable to migrate due to Cannot recv data: Connection reset by peer
> 2013-11-06 09:53:27,452 INFO  [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-35)
Migration was unsuccessful.  Cleaning up: VM[User|b363903f-992c-412a-ab8d-a9bb15e23a51]
> 2013-11-06 09:53:27,449 DEBUG [agent.transport.Request] (AgentManager-Handler-9:null)
Seq 4-2144927816: Processing:  { Ans: , MgmtId: 6959054979131, via: 4, Ver: v1, Flags: 110,
[{"com.cloud.agent.api.PrepareForMigrationAnswer":{"result":true,"wait":0}}] }
> 2013-11-06 09:53:27,452 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-9:null)
Seq 4-2144927816: No more commands found
> 2013-11-06 09:53:27,452 DEBUG [agent.transport.Request] (HA-Worker-2:work-36) Seq 4-2144927816:
Received:  { Ans: , MgmtId: 6959054979131, via: 4, Ver: v1, Flags: 110, { PrepareForMigrationAnswer
} }
> 2013-11-06 09:53:27,458 INFO  [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-4:work-34)
Completed HAWork[34-Migration-27-Running-Migrating]



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message