cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ilya <ilya.mailing.li...@gmail.com>
Subject Re: CS 4.8 VMware - Virtual Router stuck at starting
Date Thu, 28 Jul 2016 19:24:59 GMT
I guess it would help to know what type of zone you use?

Is it advanced, isolated vpc or shared network? what type of isolation?
or perhaps basic zone?

Lastly, try stopping the iptables and restarting cloud agent (via stop
and start)

Please see my response in-line

On 7/28/16 6:58 AM, Jacob Seeley wrote:
> Hi ilya,
> 
> Funny you brought up debugging the router VM. After I responding yesterday, I did just
that and I did find some odd things. 
> Just to be clear (I think we're on the same page), since I'm not the OP of this thread,
the virtual router always gets deployed and it starts up just fine; however, CloudStack reports
that it's always stuck in starting. VMs that get deployed ultimately fail. CloudStack reports
the router version as UNKNOWN.
> Before I provide what I found debugging the router VM, I'll address some of your points.
> 
> ### FOLLOW-UP QUESTIONS ###
> 
> " Another reason would be an issue of hypervisor accessing the NFS mount used for secondary
storage."
> I don't believe this is an issue. The hypervisor (VMware) does mount the secondary storage
via NFS just fine. If this were an issue, I would think the Secondary Storage and Console
VMs would not deploy.
> 
> " Use console of vCenter to see what is happening on router vm. You can login locally
with root/password and see the content of /var/log/cloud.out file, paste it on pastebin -
if it makes no sense to you..."
> It looks like to me that /var/log/cloud.out is only logged to when $CLOUD_DEBUG is set
to a non-zero length in the /etc/init.d/cloud script. As such, there isn't even a file for
/var/log/cloud.out. Even when I set that variable, I never get anything logged to /var/log/cloud.out.
However, there is a /var/log/cloud.log. Here is the contents of that: http://pastebin.com/aaTsRKZE
> 
> " you can also run /etc/init.d/cloud stop and start.. that will give you a fresh start
on logs.."
> The service is in a failed state. It's worth noting that this service is in a started
state on the Console and Secondary Storage VMs.

this is concerning - see you did "sh -x", read on..

> 
> " also, confirm that management server can talk to VR on POD IP
> (management) on port 3922.."
> It appears this is not an issue; see below:

3922 from MS to VR - this is the SSH daemon on VR with private key
8250 from VR to MS - cloudstack java agent on VR talking to MS


> 
> root@r-4-VM:~# telnet 10.70.110.101 8250
> Trying 10.70.110.101...
> Connected to 10.70.110.101.
> Escape character is '^]'.
> 


> ### ROUTE VM DEBUG ###
> 
> Here is what I found with router VM gets deployed (please tell me if anything seems off):
> 2 NICs; only one NIC gets an IP  address. CloudStack NIC1 shows an IP address coming
from the defaultGuestNetwork. NIC2 is traffic type Control but has an IP address of 0.0.0.0

It is an issue for concern to see 0.0.0.0 assigned to eth1

Lets assume NIC1 (as eth0) and NIC2 (as eth1).

1) we should not be getting 0.0.0.0 for eth1 - aka control network. This
IP should be coming from the POD network range -> when you added a pod -
i assume you did it as part of Add Zone wizard...

To see the PODIP range, goto UI
Infrastructure, Zones, Your Zone, Physical Network, Physical Network 1
(assume you did not create anything special), Management, IP Ranges ->
you should see a range defined there and it should not be 0.0.0.0...

> From the CloudStack management server, I cannot SSH into the router VM on NIC1. I've
found this is because of iptables rules on the router VM. If I issue a /etc/init.d/iptables-persistent
flush on the router VM, I can SSH into the router VM using the SSH key at port 3922.
> The service "cloud" is in a failed state. Looking at the cloud init script, I see the
following:
> 
> CMDLINE=$(cat /var/cache/cloud/cmdline)
> 
> TYPE="router"
> for i in $CMDLINE
>   do
>     # search for foo=bar pattern and cut out foo
>     FIRSTPATTERN=$(echo $i | cut -d= -f1)
>     case $FIRSTPATTERN in 
>       type)
>           TYPE=$(echo $i | cut -d= -f2)
>       ;;
>     esac
> done
> 
> The file cat /var/cache/cloud/cmdline exist; here are the contents:
> 
> template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0 gateway=10.70.116.1
domain=vit.vertitechit.com cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0
mgmtcidr=10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr disable_rp_filter=true
extra_pubnics=2 dns1=10.70.10.21 baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> 


You can also try updating your  /var/cache/cloud/cmdline with proper
value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under
Infrastructure, Routers, r-4, Nics and look for control nic..

Then try starting the cloud service..

Also, did you enable baremetal support? can you deploy a zone without
baremetal support? Perhaps there is a bug on how IPs are assigned to
eth1 (control nic)...


> The previous code suggests that the value of TYPE starts as router but will get set to
dhcpsrvr, as indicated by the contents of /var/cache/cloud/cmdline. Is this normal?
> Further down the script, I see:
> 
> CLOUDSTACK_HOME="/usr/local/cloud" <----------------------------------------Exists
> if [ -f  $CLOUDSTACK_HOME/systemvm/utils.sh ]; <----------------------------------------Does
not exist. Seems odd!
> then
>   . $CLOUDSTACK_HOME/systemvm/utils.sh
> else
>   _failure
> fi
> 
> # mkdir -p /var/log/vmops
> 
> start() {
>    local pid=$(get_pids)
>    if [ "$pid" != "" ]; then
>        echo "CloudStack cloud sevice is already running, PID = $pid"
>        return 0
>    fi
> 
>    echo -n "Starting CloudStack cloud service (type=$TYPE) "
>    if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ]; <------------------------------------------------------Does
not exist. Seems odd!
>    then
>      if [ "$pid" == "" ]
>      then
>        (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 &
)
>        pid=$(get_pids)
>        echo $pid > /var/run/cloud.pid 
>      fi
>      _success
>    else
>      _failure
>    fi
>    echo
>    echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request
> }
> 
> I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder exists; however,
the script then looks for the file /usr/local/cloud/systemvm/utils.sh. This file doesn't exist.
It also looks is supposed to start the script run.sh but that also doesn't exist. This seems
like a problem to me.
> Here you can see step through when I try to start the cloud service:
> 
> sh -x /etc/init.d/cloud start
> + ENABLED=0
> + [ -e /etc/default/cloud ]
> + . /etc/default/cloud
> + ENABLED=0
> + cat /var/cache/cloud/cmdline
> + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0 gateway=10.70.116.1
domain=vit.vertitechit.com cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0
mgmtcidr=10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr disable_rp_filter=true
extra_pubnics=2 dns1=10.70.10.21 baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> + [ ! -z ]
> + LOG_FILE=/dev/null
> + TYPE=router
> + cut -d= -f1
> + echo template=domP
> + FIRSTPATTERN=template
> + cut -d= -f1
> + echo name=r-4-VM
> + FIRSTPATTERN=name
> + cut -d= -f1
> + echo eth0ip=10.70.116.75
> + FIRSTPATTERN=eth0ip
> + cut -d= -f1
> + echo eth0mask=255.255.255.0
> + FIRSTPATTERN=eth0mask
> + cut -d= -f1
> + echo gateway=10.70.116.1
> + FIRSTPATTERN=gateway
> + cut -d= -f1
> + echo domain=vit.vertitechit.com
> + FIRSTPATTERN=domain
> + cut -d= -f1
> + echo cidrsize=24
> + FIRSTPATTERN=cidrsize
> + cut -d= -f1
> + echo dhcprange=10.70.116.1
> + FIRSTPATTERN=dhcprange
> + cut -d= -f1
> + echo eth1ip=0.0.0.0
> + FIRSTPATTERN=eth1ip
> + cut -d= -f1
> + echo eth1mask=0.0.0.0
> + FIRSTPATTERN=eth1mask
> + cut -d= -f1
> + echo mgmtcidr=10.70.110.0/24
> + FIRSTPATTERN=mgmtcidr
> + cut -d= -f1
> + echo localgw=10.70.116.1
> + FIRSTPATTERN=localgw
> + cut -d= -f1
> + echo sshonguest=true
> + FIRSTPATTERN=sshonguest
> + cut -d= -f1
> + echo type=dhcpsrvr
> + FIRSTPATTERN=type
> + cut -d= -f2
> + echo type=dhcpsrvr
> + TYPE=dhcpsrvr
> + cut -d= -f1
> + echo disable_rp_filter=true
> + FIRSTPATTERN=disable_rp_filter
> + cut -d= -f1
> + echo extra_pubnics=2
> + FIRSTPATTERN=extra_pubnics
> + cut -d= -f1
> + echo dns1=10.70.10.21
> + FIRSTPATTERN=dns1
> + cut -d= -f1
> + echo baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
> + FIRSTPATTERN=baremetalnotificationsecuritykey
> + cut -d= -f1
> + echo baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
> + FIRSTPATTERN=baremetalnotificationapikey
> + cut -d= -f1
> + echo host=10.70.110.101
> + FIRSTPATTERN=host
> + cut -d= -f1
> + echo port=8080
> + FIRSTPATTERN=port
> + cut -d= -f1
> + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> + FIRSTPATTERN=nic_macs
> + [ -f /etc/init.d/functions ]
> + [ -f ./lib/lsb/init-functions ]
> + RETVAL=0
> + CLOUDSTACK_HOME=/usr/local/cloud
> + [ -f /usr/local/cloud/systemvm/utils.sh ]
> + _failure
> + [ -f /etc/init.d/functions ]
> + echo Failed
> Failed
> + [ 0 != 0 ]
> + exit 0
> 
> Thoughts?
> 
> Jacob Seeley
> Sr. Infrastructure Engineer
> VertitechIT
> 413-268-1631
> 
> www.vertitechit.com
> 
> -----Original Message-----
> From: ilya [mailto:ilya.mailing.lists@gmail.com] 
> Sent: Wednesday, July 27, 2016 8:43 PM
> To: users@cloudstack.apache.org
> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> 
> Hi Jacob
> 
> I gave this a second read - if your issue is Router VM in starting mode
> - but not started - it means cloudstack agent on routerVM cannot talk to management server
on 8250 over POD network.
> 
> Another reason would be an issue of hypervisor accessing the NFS mount used for secondary
storage.
> 
> Use console of vCenter to see what is happening on router vm. You can login locally with
root/password and see the content of /var/log/cloud.out file, paste it on pastebin - if it
makes no sense to you...
> 
> you can also run /etc/init.d/cloud stop and start.. that will give you a fresh start
on logs..
> 
> also, confirm that management server can talk to VR on POD IP
> (management) on port 3922..
> 
> Regards
> ilya
> 
> On 7/27/16 9:34 AM, Jacob Seeley wrote:
>> ilya,
>>
>> Here are the contents of the secondary storage:
>>
>> .
>> ./template
>> ./template/tmpl
>> ./template/tmpl/1
>> ./template/tmpl/1/8
>> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
>> ./template/tmpl/1/8/template.properties
>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
>> are.ovf 
>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
>> are-disk3.vmdk
>> ./template/tmpl/1/7
>> ./template/tmpl/1/7/template.properties
>> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
>> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
>> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
>> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
>> ./systemvm
>> ./systemvm/systemvm-4.8.0.1.iso
>> ./systemvm/.lck-bf162a0100000000
>> ./snapshots
>> ./volumes
>>
>> I've noticed that both the Secondary Storage VM and Console Proxy VM mount this ISO
and as stated before, they come up just fine.
>>
>> Regards,
>>
>> Jacob Seeley
>> Sr. Infrastructure Engineer
>> VertitechIT
>> 413-268-1631
>>
>> www.vertitechit.com
>>
>> -----Original Message-----
>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>> Sent: Wednesday, July 27, 2016 3:22 AM
>> To: users@cloudstack.apache.org
>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>
>> Jacob
>>
>> The upgrade usually occurs though systemvm.iso - that is generated by cloudstack
on the first start.
>>
>> Please show the content of your secondary store specifically
>>
>> /mnt/[secondary-storage]/systemvm
>>
>> Regards
>> ilya
>>
>> On 7/25/16 11:19 AM, Jacob Seeley wrote:
>>> Here is a pastebin snippet the management-server.log - 
>>> http://pastebin.com/GCLm53Gz
>>>
>>> Hopefully the relevant data is in there.
>>>
>>> I made sure to start from scratch for this example. Everything from the vSphere
ESXi to the vCenter to the CentOS 7 with CloudStack install is fresh. I deployed a new instance
in CloudStack, a VM internally named i-2-3-VM with an IP address of 192.168.0.78. This prompted
CloudStack to deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
>>>
>>> Thank you,
>>>
>>> Jacob Seeley
>>> Sr. Infrastructure Engineer
>>> VertitechIT
>>> 413-268-1631
>>>
>>> www.vertitechit.com
>>>
>>> -----Original Message-----
>>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
>>> Sent: Monday, July 25, 2016 1:37 AM
>>> To: users@cloudstack.apache.org
>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>
>>> please upload the logs in the issue.
>>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <darrentang.dt@gmail.com> wrote:
>>>>
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>>>
>>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <glenn.wagner@shapeblue.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>> What template are you using to start your first VM? - the default 
>>>>> vmware template?
>>>>> If you look in vcenter , what does the console show you ?
>>>>>
>>>>>
>>>>> Glenn
>>>>>
>>>>>
>>>>>
>>>>> glenn.wagner@shapeblue.com
>>>>> www.shapeblue.com
>>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town 
>>>>> 7130South Africa @shapeblue
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Pascal R. [mailto:repa182@gmail.com]
>>>>> Sent: Monday, 04 July 2016 1:26 PM
>>>>> To: users@cloudstack.apache.org
>>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>
>>>>> hi,
>>>>>
>>>>> we have a CS4.8 deployment with VMWare 5.5.
>>>>>
>>>>> When trying to launch the first VM, the VS is created. VS starts 
>>>>> up, but in CS, it stuck with "starting" state.
>>>>>
>>>>> i can't find any usefull information in the logs.
>>>>>
>>>>> any hint?
>>>>>
>>>
>>>
>>>
>>>
>>> DISCLAIMER
>>> ==========
>>> This e-mail may contain privileged and confidential information which is the
property of Accelerite, a Persistent Systems business. It is intended only for the use of
the individual or entity to which it is addressed. If you are not the intended recipient,
you are not authorized to read, retain, copy, print, distribute or use this message. If you
have received this communication in error, please notify the sender and delete all copies
of this message. Accelerite, a Persistent Systems business does not accept any liability for
virus infected mails.
>>>

Mime
View raw message