cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ilya <ilya.mailing.li...@gmail.com>
Subject Re: CS 4.8 VMware - Virtual Router stuck at starting
Date Fri, 29 Jul 2016 06:43:16 GMT
Daren

I'm also running 4.5.2 - and like the stability we get with it.

For the features we need, 4.5.2 - has everything that is required, so I
dont see huge benefit of upgrading to latest ACS ATM. Also, our
environments are very large and complex - so upgrade is not something I
can take lightly.

With that said, i do have a small 8 node Lab environment i can try the
upgrade on, it consists of 4 ESXi and 4 KVM nodes - so it should be a
fair test.

Lets wait for Jacob to respond with his test of setting up IP/Netmask
for eth1 router vm, if it does not help, i'll try to upgrade to see if i
can reproduce the issue.

Regards
ilya

On 7/28/16 9:43 PM, Darren Tang wrote:
> Hi ilya:
>  I can confirm that issus,  please check :
> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>  When we deployed cloudstack(4.6/4.7/4.8)  with vmware(5.x/6.0) in basic
> zone,  The VR is nerver leaves the "starting" state.  fell back to 4.5 is
> fine.
>  Maybe you can test it by yourself.
> 
> 2016-07-29 3:24 GMT+08:00 ilya <ilya.mailing.lists@gmail.com>:
> 
>> I guess it would help to know what type of zone you use?
>>
>> Is it advanced, isolated vpc or shared network? what type of isolation?
>> or perhaps basic zone?
>>
>> Lastly, try stopping the iptables and restarting cloud agent (via stop
>> and start)
>>
>> Please see my response in-line
>>
>> On 7/28/16 6:58 AM, Jacob Seeley wrote:
>>> Hi ilya,
>>>
>>> Funny you brought up debugging the router VM. After I responding
>> yesterday, I did just that and I did find some odd things.
>>> Just to be clear (I think we're on the same page), since I'm not the OP
>> of this thread, the virtual router always gets deployed and it starts up
>> just fine; however, CloudStack reports that it's always stuck in starting.
>> VMs that get deployed ultimately fail. CloudStack reports the router
>> version as UNKNOWN.
>>> Before I provide what I found debugging the router VM, I'll address some
>> of your points.
>>>
>>> ### FOLLOW-UP QUESTIONS ###
>>>
>>> " Another reason would be an issue of hypervisor accessing the NFS mount
>> used for secondary storage."
>>> I don't believe this is an issue. The hypervisor (VMware) does mount the
>> secondary storage via NFS just fine. If this were an issue, I would think
>> the Secondary Storage and Console VMs would not deploy.
>>>
>>> " Use console of vCenter to see what is happening on router vm. You can
>> login locally with root/password and see the content of /var/log/cloud.out
>> file, paste it on pastebin - if it makes no sense to you..."
>>> It looks like to me that /var/log/cloud.out is only logged to when
>> $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script.
>> As such, there isn't even a file for /var/log/cloud.out. Even when I set
>> that variable, I never get anything logged to /var/log/cloud.out. However,
>> there is a /var/log/cloud.log. Here is the contents of that:
>> http://pastebin.com/aaTsRKZE
>>>
>>> " you can also run /etc/init.d/cloud stop and start.. that will give you
>> a fresh start on logs.."
>>> The service is in a failed state. It's worth noting that this service is
>> in a started state on the Console and Secondary Storage VMs.
>>
>> this is concerning - see you did "sh -x", read on..
>>
>>>
>>> " also, confirm that management server can talk to VR on POD IP
>>> (management) on port 3922.."
>>> It appears this is not an issue; see below:
>>
>> 3922 from MS to VR - this is the SSH daemon on VR with private key
>> 8250 from VR to MS - cloudstack java agent on VR talking to MS
>>
>>
>>>
>>> root@r-4-VM:~# telnet 10.70.110.101 8250
>>> Trying 10.70.110.101...
>>> Connected to 10.70.110.101.
>>> Escape character is '^]'.
>>>
>>
>>
>>> ### ROUTE VM DEBUG ###
>>>
>>> Here is what I found with router VM gets deployed (please tell me if
>> anything seems off):
>>> 2 NICs; only one NIC gets an IP  address. CloudStack NIC1 shows an IP
>> address coming from the defaultGuestNetwork. NIC2 is traffic type Control
>> but has an IP address of 0.0.0.0
>>
>> It is an issue for concern to see 0.0.0.0 assigned to eth1
>>
>> Lets assume NIC1 (as eth0) and NIC2 (as eth1).
>>
>> 1) we should not be getting 0.0.0.0 for eth1 - aka control network. This
>> IP should be coming from the POD network range -> when you added a pod -
>> i assume you did it as part of Add Zone wizard...
>>
>> To see the PODIP range, goto UI
>> Infrastructure, Zones, Your Zone, Physical Network, Physical Network 1
>> (assume you did not create anything special), Management, IP Ranges ->
>> you should see a range defined there and it should not be 0.0.0.0...
>>
>>> From the CloudStack management server, I cannot SSH into the router VM
>> on NIC1. I've found this is because of iptables rules on the router VM. If
>> I issue a /etc/init.d/iptables-persistent flush on the router VM, I can SSH
>> into the router VM using the SSH key at port 3922.
>>> The service "cloud" is in a failed state. Looking at the cloud init
>> script, I see the following:
>>>
>>> CMDLINE=$(cat /var/cache/cloud/cmdline)
>>>
>>> TYPE="router"
>>> for i in $CMDLINE
>>>   do
>>>     # search for foo=bar pattern and cut out foo
>>>     FIRSTPATTERN=$(echo $i | cut -d= -f1)
>>>     case $FIRSTPATTERN in
>>>       type)
>>>           TYPE=$(echo $i | cut -d= -f2)
>>>       ;;
>>>     esac
>>> done
>>>
>>> The file cat /var/cache/cloud/cmdline exist; here are the contents:
>>>
>>> template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0
>> gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24
>> dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
>> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr
>> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21
>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>> host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>>
>>
>>
>> You can also try updating your  /var/cache/cloud/cmdline with proper
>> value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under
>> Infrastructure, Routers, r-4, Nics and look for control nic..
>>
>> Then try starting the cloud service..
>>
>> Also, did you enable baremetal support? can you deploy a zone without
>> baremetal support? Perhaps there is a bug on how IPs are assigned to
>> eth1 (control nic)...
>>
>>
>>> The previous code suggests that the value of TYPE starts as router but
>> will get set to dhcpsrvr, as indicated by the contents of
>> /var/cache/cloud/cmdline. Is this normal?
>>> Further down the script, I see:
>>>
>>> CLOUDSTACK_HOME="/usr/local/cloud"
>> <----------------------------------------Exists
>>> if [ -f  $CLOUDSTACK_HOME/systemvm/utils.sh ];
>> <----------------------------------------Does not exist. Seems odd!
>>> then
>>>   . $CLOUDSTACK_HOME/systemvm/utils.sh
>>> else
>>>   _failure
>>> fi
>>>
>>> # mkdir -p /var/log/vmops
>>>
>>> start() {
>>>    local pid=$(get_pids)
>>>    if [ "$pid" != "" ]; then
>>>        echo "CloudStack cloud sevice is already running, PID = $pid"
>>>        return 0
>>>    fi
>>>
>>>    echo -n "Starting CloudStack cloud service (type=$TYPE) "
>>>    if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ];
>> <------------------------------------------------------Does not exist.
>> Seems odd!
>>>    then
>>>      if [ "$pid" == "" ]
>>>      then
>>>        (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1
& )
>>>        pid=$(get_pids)
>>>        echo $pid > /var/run/cloud.pid
>>>      fi
>>>      _success
>>>    else
>>>      _failure
>>>    fi
>>>    echo
>>>    echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request
>>> }
>>>
>>> I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder
>> exists; however, the script then looks for the file
>> /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also looks
>> is supposed to start the script run.sh but that also doesn't exist. This
>> seems like a problem to me.
>>> Here you can see step through when I try to start the cloud service:
>>>
>>> sh -x /etc/init.d/cloud start
>>> + ENABLED=0
>>> + [ -e /etc/default/cloud ]
>>> + . /etc/default/cloud
>>> + ENABLED=0
>>> + cat /var/cache/cloud/cmdline
>>> + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75
>> eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com
>> cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
>> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr
>> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21
>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>> host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>> + [ ! -z ]
>>> + LOG_FILE=/dev/null
>>> + TYPE=router
>>> + cut -d= -f1
>>> + echo template=domP
>>> + FIRSTPATTERN=template
>>> + cut -d= -f1
>>> + echo name=r-4-VM
>>> + FIRSTPATTERN=name
>>> + cut -d= -f1
>>> + echo eth0ip=10.70.116.75
>>> + FIRSTPATTERN=eth0ip
>>> + cut -d= -f1
>>> + echo eth0mask=255.255.255.0
>>> + FIRSTPATTERN=eth0mask
>>> + cut -d= -f1
>>> + echo gateway=10.70.116.1
>>> + FIRSTPATTERN=gateway
>>> + cut -d= -f1
>>> + echo domain=vit.vertitechit.com
>>> + FIRSTPATTERN=domain
>>> + cut -d= -f1
>>> + echo cidrsize=24
>>> + FIRSTPATTERN=cidrsize
>>> + cut -d= -f1
>>> + echo dhcprange=10.70.116.1
>>> + FIRSTPATTERN=dhcprange
>>> + cut -d= -f1
>>> + echo eth1ip=0.0.0.0
>>> + FIRSTPATTERN=eth1ip
>>> + cut -d= -f1
>>> + echo eth1mask=0.0.0.0
>>> + FIRSTPATTERN=eth1mask
>>> + cut -d= -f1
>>> + echo mgmtcidr=10.70.110.0/24
>>> + FIRSTPATTERN=mgmtcidr
>>> + cut -d= -f1
>>> + echo localgw=10.70.116.1
>>> + FIRSTPATTERN=localgw
>>> + cut -d= -f1
>>> + echo sshonguest=true
>>> + FIRSTPATTERN=sshonguest
>>> + cut -d= -f1
>>> + echo type=dhcpsrvr
>>> + FIRSTPATTERN=type
>>> + cut -d= -f2
>>> + echo type=dhcpsrvr
>>> + TYPE=dhcpsrvr
>>> + cut -d= -f1
>>> + echo disable_rp_filter=true
>>> + FIRSTPATTERN=disable_rp_filter
>>> + cut -d= -f1
>>> + echo extra_pubnics=2
>>> + FIRSTPATTERN=extra_pubnics
>>> + cut -d= -f1
>>> + echo dns1=10.70.10.21
>>> + FIRSTPATTERN=dns1
>>> + cut -d= -f1
>>> + echo
>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>>> + FIRSTPATTERN=baremetalnotificationsecuritykey
>>> + cut -d= -f1
>>> + echo
>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>>> + FIRSTPATTERN=baremetalnotificationapikey
>>> + cut -d= -f1
>>> + echo host=10.70.110.101
>>> + FIRSTPATTERN=host
>>> + cut -d= -f1
>>> + echo port=8080
>>> + FIRSTPATTERN=port
>>> + cut -d= -f1
>>> + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>> + FIRSTPATTERN=nic_macs
>>> + [ -f /etc/init.d/functions ]
>>> + [ -f ./lib/lsb/init-functions ]
>>> + RETVAL=0
>>> + CLOUDSTACK_HOME=/usr/local/cloud
>>> + [ -f /usr/local/cloud/systemvm/utils.sh ]
>>> + _failure
>>> + [ -f /etc/init.d/functions ]
>>> + echo Failed
>>> Failed
>>> + [ 0 != 0 ]
>>> + exit 0
>>>
>>> Thoughts?
>>>
>>> Jacob Seeley
>>> Sr. Infrastructure Engineer
>>> VertitechIT
>>> 413-268-1631
>>>
>>> www.vertitechit.com
>>>
>>> -----Original Message-----
>>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>>> Sent: Wednesday, July 27, 2016 8:43 PM
>>> To: users@cloudstack.apache.org
>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>
>>> Hi Jacob
>>>
>>> I gave this a second read - if your issue is Router VM in starting mode
>>> - but not started - it means cloudstack agent on routerVM cannot talk to
>> management server on 8250 over POD network.
>>>
>>> Another reason would be an issue of hypervisor accessing the NFS mount
>> used for secondary storage.
>>>
>>> Use console of vCenter to see what is happening on router vm. You can
>> login locally with root/password and see the content of /var/log/cloud.out
>> file, paste it on pastebin - if it makes no sense to you...
>>>
>>> you can also run /etc/init.d/cloud stop and start.. that will give you a
>> fresh start on logs..
>>>
>>> also, confirm that management server can talk to VR on POD IP
>>> (management) on port 3922..
>>>
>>> Regards
>>> ilya
>>>
>>> On 7/27/16 9:34 AM, Jacob Seeley wrote:
>>>> ilya,
>>>>
>>>> Here are the contents of the secondary storage:
>>>>
>>>> .
>>>> ./template
>>>> ./template/tmpl
>>>> ./template/tmpl/1
>>>> ./template/tmpl/1/8
>>>> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
>>>> ./template/tmpl/1/8/template.properties
>>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
>>>> are.ovf
>>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
>>>> are-disk3.vmdk
>>>> ./template/tmpl/1/7
>>>> ./template/tmpl/1/7/template.properties
>>>> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
>>>> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
>>>> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
>>>> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
>>>> ./systemvm
>>>> ./systemvm/systemvm-4.8.0.1.iso
>>>> ./systemvm/.lck-bf162a0100000000
>>>> ./snapshots
>>>> ./volumes
>>>>
>>>> I've noticed that both the Secondary Storage VM and Console Proxy VM
>> mount this ISO and as stated before, they come up just fine.
>>>>
>>>> Regards,
>>>>
>>>> Jacob Seeley
>>>> Sr. Infrastructure Engineer
>>>> VertitechIT
>>>> 413-268-1631
>>>>
>>>> www.vertitechit.com
>>>>
>>>> -----Original Message-----
>>>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>>>> Sent: Wednesday, July 27, 2016 3:22 AM
>>>> To: users@cloudstack.apache.org
>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>
>>>> Jacob
>>>>
>>>> The upgrade usually occurs though systemvm.iso - that is generated by
>> cloudstack on the first start.
>>>>
>>>> Please show the content of your secondary store specifically
>>>>
>>>> /mnt/[secondary-storage]/systemvm
>>>>
>>>> Regards
>>>> ilya
>>>>
>>>> On 7/25/16 11:19 AM, Jacob Seeley wrote:
>>>>> Here is a pastebin snippet the management-server.log -
>>>>> http://pastebin.com/GCLm53Gz
>>>>>
>>>>> Hopefully the relevant data is in there.
>>>>>
>>>>> I made sure to start from scratch for this example. Everything from
>> the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is
>> fresh. I deployed a new instance in CloudStack, a VM internally named
>> i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to
>> deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Jacob Seeley
>>>>> Sr. Infrastructure Engineer
>>>>> VertitechIT
>>>>> 413-268-1631
>>>>>
>>>>> www.vertitechit.com
>>>>>
>>>>> -----Original Message-----
>>>>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
>>>>> Sent: Monday, July 25, 2016 1:37 AM
>>>>> To: users@cloudstack.apache.org
>>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>
>>>>> please upload the logs in the issue.
>>>>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <darrentang.dt@gmail.com>
>> wrote:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>>>>>
>>>>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <glenn.wagner@shapeblue.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> What template are you using to start your first VM? - the default
>>>>>>> vmware template?
>>>>>>> If you look in vcenter , what does the console show you ?
>>>>>>>
>>>>>>>
>>>>>>> Glenn
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> glenn.wagner@shapeblue.com
>>>>>>> www.shapeblue.com
>>>>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape
Town
>>>>>>> 7130South Africa @shapeblue
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Pascal R. [mailto:repa182@gmail.com]
>>>>>>> Sent: Monday, 04 July 2016 1:26 PM
>>>>>>> To: users@cloudstack.apache.org
>>>>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>>>
>>>>>>> hi,
>>>>>>>
>>>>>>> we have a CS4.8 deployment with VMWare 5.5.
>>>>>>>
>>>>>>> When trying to launch the first VM, the VS is created. VS starts
>>>>>>> up, but in CS, it stuck with "starting" state.
>>>>>>>
>>>>>>> i can't find any usefull information in the logs.
>>>>>>>
>>>>>>> any hint?
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> DISCLAIMER
>>>>> ==========
>>>>> This e-mail may contain privileged and confidential information which
>> is the property of Accelerite, a Persistent Systems business. It is
>> intended only for the use of the individual or entity to which it is
>> addressed. If you are not the intended recipient, you are not authorized to
>> read, retain, copy, print, distribute or use this message. If you have
>> received this communication in error, please notify the sender and delete
>> all copies of this message. Accelerite, a Persistent Systems business does
>> not accept any liability for virus infected mails.
>>>>>
>>
> 

Mime
View raw message