cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luciano Castro <luciano.cas...@gmail.com>
Subject Re: HA feature - KVM - CloudStack 4.5.1
Date Mon, 20 Jul 2015 14:44:05 GMT
Hi !

My test today: I stopped other instance, and changed to HA Offer. I started
this instance.

After, I shutdown gracefully the KVM host of it.

and I checked the investigators process:

[root@1q2 ~]# grep -i Investigator
/var/log/cloudstack/management/management-server.log


[root@1q2 ~]# date
Mon Jul 20 14:39:43 UTC 2015

[root@1q2 ~]# ls -ltrh /var/log/cloudstack/management/management-server.log
-rw-rw-r--. 1 cloud cloud 14M Jul 20 14:39
/var/log/cloudstack/management/management-server.log



Nothing.  I dont know how internally these process work. but seems that
they are not working well, agree?

options                     value
ha.investigators.exclude     nothing
ha.investigators.orde
SimpleInvestigator,XenServerInvestigator,KVMInvestigator,HypervInvestigator,VMwareInvestigator,PingInvestigator,ManagementIPSysVMInvestigator
investigate.retry.interval    60

There´s a way to check if these process are running ?

[root@1q2 ~]# ps waux| grep -i java
root     11408  0.0  0.0 103252   880 pts/0    S+   14:44   0:00 grep -i
java
cloud    24225  0.7  1.7 16982036 876412 ?     Sl   Jul16  43:48
/usr/lib/jvm/jre-1.7.0/bin/java -Djava.awt.headless=true
-Dcom.sun.management.jmxremote=false -Xmx2g -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/cloudstack/management/ -XX:PermSize=512M
-XX:MaxPermSize=800m
-Djava.security.properties=/etc/cloudstack/management/java.security.ciphers
-classpath
:::/etc/cloudstack/management:/usr/share/cloudstack-management/setup:/usr/share/cloudstack-management/bin/bootstrap.jar:/usr/share/cloudstack-management/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
-Dcatalina.base=/usr/share/cloudstack-management
-Dcatalina.home=/usr/share/cloudstack-management -Djava.endorsed.dirs=
-Djava.io.tmpdir=/usr/share/cloudstack-management/temp
-Djava.util.logging.config.file=/usr/share/cloudstack-management/conf/logging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
org.apache.catalina.startup.Bootstrap start



Thanks



On Sat, Jul 18, 2015 at 1:53 PM, Milamber <milamber@apache.org> wrote:

>
>
> On 17/07/2015 22:26, Somesh Naidu wrote:
>
>> Perhaps, the management server don't reconize the host 3 totally down
>>> (ping alive? or some quorum don't ok)
>>> The only way to the mgt server to accept totally that the host 3 has a
>>> real problem that the host 3 has been reboot (around 12:44)?
>>>
>> The host disconnect was triggered at 12:19 on host 3. Mgmt server was
>> pretty sure the host is down (it was a graceful shutdown I believe) which
>> is why it triggered a disconnect and notified other nodes. There was no
>> checkhealth/checkonhost/etc. triggered; just the agent disconnected and all
>> listeners (ping/etc.) notified.
>>
>> At this time mgmt server should have scheduled HA on all VMs running on
>> that host. The HA investigators would then work their way identifying
>> whether the VMs are still running, if they need to be fenced, etc. But this
>> never happened.
>>
>
>
> AFAIK, stopping the cloudstack-agent service don't allow to start the HA
> process for the VMs hosted by the node. Seems normal to me that the HA
> process don't start at this moment.
> If I would start the HA process on a node, I go to the Web UI (or
> cloudmonkey) to change the state of the Host from Up to Maintenance.
>
>
> (after I can stop the CS-agent service if I need for exemple reboot a node)
>
>
>
>> Regards,
>> Somesh
>>
>>
>> -----Original Message-----
>> From: Milamber [mailto:milamber@apache.org]
>> Sent: Friday, July 17, 2015 6:01 PM
>> To: users@cloudstack.apache.org
>> Subject: Re: HA feature - KVM - CloudStack 4.5.1
>>
>>
>>
>> On 17/07/2015 21:23, Somesh Naidu wrote:
>>
>>> Ok, so here are my findings.
>>>
>>> 1. Host ID 3 was shutdown around 2015-07-16 12:19:09 at which point
>>> management server called a disconnect.
>>> 2. Based on the logs, it seems VM IDs 32, 18, 39 and 46 were running on
>>> the host.
>>> 3. No HA tasks for any of these VMs at this time.
>>> 5. Management server restarted at around 2015-07-16 12:30:20.
>>> 6. Host ID 3 connected back at around 2015-07-16 12:44:08.
>>> 7. Management server identified the missing VMs and triggered HA on
>>> those.
>>> 8. The VMs were eventually started, all 4 of them.
>>>
>>> I am not 100% sure why HA wasn't triggered until 2015-07-16 12:30 (#3),
>>> but I know that management server restart caused it not happen until the
>>> host was reconnected.
>>>
>> Perhaps, the management server don't reconize the host 3 totally down
>> (ping alive? or some quorum don't ok)
>> The only way to the mgt server to accept totally that the host 3 has a
>> real problem that the host 3 has been reboot (around 12:44)?
>>
>> What is the storage subsystem? CLVMd?
>>
>>
>>  Regards,
>>> Somesh
>>>
>>>
>>> -----Original Message-----
>>> From: Luciano Castro [mailto:luciano.castro@gmail.com]
>>> Sent: Friday, July 17, 2015 12:13 PM
>>> To: users@cloudstack.apache.org
>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1
>>>
>>> No problems Somesh, thanks for your help.
>>>
>>> Link of log:
>>>
>>>
>>> https://dl.dropboxusercontent.com/u/6774061/management-server.log.2015-07-16.gz
>>>
>>> Luciano
>>>
>>> On Fri, Jul 17, 2015 at 12:00 PM, Somesh Naidu <Somesh.Naidu@citrix.com>
>>> wrote:
>>>
>>>  How large is the management server logs dated 2015-07-16? I would like
>>>> to
>>>> review the logs. All the information I need from that incident should
>>>> be in
>>>> there so I don't need any more testing.
>>>>
>>>> Regards,
>>>> Somesh
>>>>
>>>> -----Original Message-----
>>>> From: Luciano Castro [mailto:luciano.castro@gmail.com]
>>>> Sent: Friday, July 17, 2015 7:58 AM
>>>> To: users@cloudstack.apache.org
>>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1
>>>>
>>>> Hi Somesh!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [root@1q2 ~]# zgrep -i -E
>>>>
>>>>
>>>> 'SimpleIvestigator|KVMInvestigator|PingInvestigator|ManagementIPSysVMInvestigator'
>>>> /var/log/cloudstack/management/management-server.log.2015-07-16.gz |tail
>>>> -5000 > /tmp/management.txt
>>>> [root@1q2 ~]# cat /tmp/management.txt
>>>> 2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.ExtensionRegistry]
>>>> (main:null)
>>>> Registering extension [KVMInvestigator] in [Ha Investigators Registry]
>>>> 2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.RegistryLifecycle]
>>>> (main:null)
>>>> Registered com.cloud.ha.KVMInvestigator@57ceec9a
>>>> 2015-07-16 12:30:45,927 DEBUG [o.a.c.s.l.r.ExtensionRegistry]
>>>> (main:null)
>>>> Registering extension [PingInvestigator] in [Ha Investigators Registry]
>>>> 2015-07-16 12:30:45,928 DEBUG [o.a.c.s.l.r.ExtensionRegistry]
>>>> (main:null)
>>>> Registering extension [ManagementIPSysVMInvestigator] in [Ha
>>>> Investigators
>>>> Registry]
>>>> 2015-07-16 12:30:53,796 INFO  [o.a.c.s.l.r.DumpRegistry] (main:null)
>>>> Registry [Ha Investigators Registry] contains [SimpleInvestigator,
>>>> XenServerInvestigator, KVMInv
>>>>
>>>> I  searched  this log before, but as I thought that had not nothing
>>>> special.
>>>>
>>>> If you want propose to me another scenario of test, I can do it.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Thu, Jul 16, 2015 at 7:27 PM, Somesh Naidu <Somesh.Naidu@citrix.com>
>>>> wrote:
>>>>
>>>>  What about other investigators, specifically " KVMInvestigator,
>>>>> PingInvestigator"? They report the VMs as alive=false too?
>>>>>
>>>>> Also, it is recommended that you look at the management-sever.log
>>>>> instead
>>>>> of catalina.out (for one, the latter doesn’t have timestamp).
>>>>>
>>>>> Regards,
>>>>> Somesh
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Luciano Castro [mailto:luciano.castro@gmail.com]
>>>>> Sent: Thursday, July 16, 2015 1:14 PM
>>>>> To: users@cloudstack.apache.org
>>>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1
>>>>>
>>>>> Hi Somesh!
>>>>>
>>>>>
>>>>> thanks for help.. I did again ,and I collected new logs:
>>>>>
>>>>> My vm_instance name is i-2-39-VM. There was some routers in KVM host
>>>>> 'A'
>>>>> (this one that I powered off now):
>>>>>
>>>>>
>>>>> [root@1q2 ~]# grep -i -E 'SimpleInvestigator.*false'
>>>>> /var/log/cloudstack/management/catalina.out
>>>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-e2f91c9c
>>>>>
>>>> work-3)
>>>>
>>>>> SimpleInvestigator found VM[DomainRouter|r-4-VM]to be alive? false
>>>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-729acf4f
>>>>>
>>>> work-7)
>>>>
>>>>> SimpleInvestigator found VM[User|i-23-33-VM]to be alive? false
>>>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-a66a4941
>>>>>
>>>> work-8)
>>>>
>>>>> SimpleInvestigator found VM[DomainRouter|r-36-VM]to be alive? false
>>>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-5977245e
>>>>> work-10) SimpleInvestigator found VM[User|i-17-26-VM]to be alive? false
>>>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-c7f39be0
>>>>>
>>>> work-9)
>>>>
>>>>> SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive? false
>>>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-ad4f5fda
>>>>> work-10) SimpleInvestigator found VM[DomainRouter|r-46-VM]to be alive?
>>>>> false
>>>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-0257f5af
>>>>> work-11) SimpleInvestigator found VM[User|i-4-52-VM]to be alive? false
>>>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-7ddff382
>>>>> work-12) SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive?
>>>>> false
>>>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-9f79917e
>>>>> work-13) SimpleInvestigator found VM[User|i-2-39-VM]to be alive? false
>>>>>
>>>>>
>>>>>
>>>>> KVM  host 'B' agent log (where the machine would be migrate):
>>>>>
>>>>> 2015-07-16 16:58:56,537 INFO  [kvm.resource.LibvirtComputingResource]
>>>>> (agentRequest-Handler-4:null) Live migration of instance i-2-39-VM
>>>>> initiated
>>>>> 2015-07-16 16:58:57,540 INFO  [kvm.resource.LibvirtComputingResource]
>>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
>>>>> complete, waited 1000ms
>>>>> 2015-07-16 16:58:58,541 INFO  [kvm.resource.LibvirtComputingResource]
>>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
>>>>> complete, waited 2000ms
>>>>> 2015-07-16 16:58:59,542 INFO  [kvm.resource.LibvirtComputingResource]
>>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
>>>>> complete, waited 3000ms
>>>>> 2015-07-16 16:59:00,543 INFO  [kvm.resource.LibvirtComputingResource]
>>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
>>>>> complete, waited 4000ms
>>>>> 2015-07-16 16:59:01,245 INFO  [kvm.resource.LibvirtComputingResource]
>>>>> (agentRequest-Handler-4:null) Migration thread for i-2-39-VM is done
>>>>>
>>>>> It said done for my i-2-39-VM instance, but I can´t ping this host.
>>>>>
>>>>> Luciano
>>>>>
>>>>>
>>>> --
>>>> Luciano Castro
>>>>
>>>>
>>>
>


-- 
Luciano Castro

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message