cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Tang <darrentang...@gmail.com>
Subject Re: CS 4.8 VMware - Virtual Router stuck at starting
Date Fri, 29 Jul 2016 04:43:18 GMT
Hi ilya:
 I can confirm that issus,  please check :
https://issues.apache.org/jira/browse/CLOUDSTACK-9144
 When we deployed cloudstack(4.6/4.7/4.8)  with vmware(5.x/6.0) in basic
zone,  The VR is nerver leaves the "starting" state.  fell back to 4.5 is
fine.
 Maybe you can test it by yourself.

2016-07-29 3:24 GMT+08:00 ilya <ilya.mailing.lists@gmail.com>:

> I guess it would help to know what type of zone you use?
>
> Is it advanced, isolated vpc or shared network? what type of isolation?
> or perhaps basic zone?
>
> Lastly, try stopping the iptables and restarting cloud agent (via stop
> and start)
>
> Please see my response in-line
>
> On 7/28/16 6:58 AM, Jacob Seeley wrote:
> > Hi ilya,
> >
> > Funny you brought up debugging the router VM. After I responding
> yesterday, I did just that and I did find some odd things.
> > Just to be clear (I think we're on the same page), since I'm not the OP
> of this thread, the virtual router always gets deployed and it starts up
> just fine; however, CloudStack reports that it's always stuck in starting.
> VMs that get deployed ultimately fail. CloudStack reports the router
> version as UNKNOWN.
> > Before I provide what I found debugging the router VM, I'll address some
> of your points.
> >
> > ### FOLLOW-UP QUESTIONS ###
> >
> > " Another reason would be an issue of hypervisor accessing the NFS mount
> used for secondary storage."
> > I don't believe this is an issue. The hypervisor (VMware) does mount the
> secondary storage via NFS just fine. If this were an issue, I would think
> the Secondary Storage and Console VMs would not deploy.
> >
> > " Use console of vCenter to see what is happening on router vm. You can
> login locally with root/password and see the content of /var/log/cloud.out
> file, paste it on pastebin - if it makes no sense to you..."
> > It looks like to me that /var/log/cloud.out is only logged to when
> $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script.
> As such, there isn't even a file for /var/log/cloud.out. Even when I set
> that variable, I never get anything logged to /var/log/cloud.out. However,
> there is a /var/log/cloud.log. Here is the contents of that:
> http://pastebin.com/aaTsRKZE
> >
> > " you can also run /etc/init.d/cloud stop and start.. that will give you
> a fresh start on logs.."
> > The service is in a failed state. It's worth noting that this service is
> in a started state on the Console and Secondary Storage VMs.
>
> this is concerning - see you did "sh -x", read on..
>
> >
> > " also, confirm that management server can talk to VR on POD IP
> > (management) on port 3922.."
> > It appears this is not an issue; see below:
>
> 3922 from MS to VR - this is the SSH daemon on VR with private key
> 8250 from VR to MS - cloudstack java agent on VR talking to MS
>
>
> >
> > root@r-4-VM:~# telnet 10.70.110.101 8250
> > Trying 10.70.110.101...
> > Connected to 10.70.110.101.
> > Escape character is '^]'.
> >
>
>
> > ### ROUTE VM DEBUG ###
> >
> > Here is what I found with router VM gets deployed (please tell me if
> anything seems off):
> > 2 NICs; only one NIC gets an IP  address. CloudStack NIC1 shows an IP
> address coming from the defaultGuestNetwork. NIC2 is traffic type Control
> but has an IP address of 0.0.0.0
>
> It is an issue for concern to see 0.0.0.0 assigned to eth1
>
> Lets assume NIC1 (as eth0) and NIC2 (as eth1).
>
> 1) we should not be getting 0.0.0.0 for eth1 - aka control network. This
> IP should be coming from the POD network range -> when you added a pod -
> i assume you did it as part of Add Zone wizard...
>
> To see the PODIP range, goto UI
> Infrastructure, Zones, Your Zone, Physical Network, Physical Network 1
> (assume you did not create anything special), Management, IP Ranges ->
> you should see a range defined there and it should not be 0.0.0.0...
>
> > From the CloudStack management server, I cannot SSH into the router VM
> on NIC1. I've found this is because of iptables rules on the router VM. If
> I issue a /etc/init.d/iptables-persistent flush on the router VM, I can SSH
> into the router VM using the SSH key at port 3922.
> > The service "cloud" is in a failed state. Looking at the cloud init
> script, I see the following:
> >
> > CMDLINE=$(cat /var/cache/cloud/cmdline)
> >
> > TYPE="router"
> > for i in $CMDLINE
> >   do
> >     # search for foo=bar pattern and cut out foo
> >     FIRSTPATTERN=$(echo $i | cut -d= -f1)
> >     case $FIRSTPATTERN in
> >       type)
> >           TYPE=$(echo $i | cut -d= -f2)
> >       ;;
> >     esac
> > done
> >
> > The file cat /var/cache/cloud/cmdline exist; here are the contents:
> >
> > template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0
> gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24
> dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr
> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21
> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
> host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> >
>
>
> You can also try updating your  /var/cache/cloud/cmdline with proper
> value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under
> Infrastructure, Routers, r-4, Nics and look for control nic..
>
> Then try starting the cloud service..
>
> Also, did you enable baremetal support? can you deploy a zone without
> baremetal support? Perhaps there is a bug on how IPs are assigned to
> eth1 (control nic)...
>
>
> > The previous code suggests that the value of TYPE starts as router but
> will get set to dhcpsrvr, as indicated by the contents of
> /var/cache/cloud/cmdline. Is this normal?
> > Further down the script, I see:
> >
> > CLOUDSTACK_HOME="/usr/local/cloud"
> <----------------------------------------Exists
> > if [ -f  $CLOUDSTACK_HOME/systemvm/utils.sh ];
> <----------------------------------------Does not exist. Seems odd!
> > then
> >   . $CLOUDSTACK_HOME/systemvm/utils.sh
> > else
> >   _failure
> > fi
> >
> > # mkdir -p /var/log/vmops
> >
> > start() {
> >    local pid=$(get_pids)
> >    if [ "$pid" != "" ]; then
> >        echo "CloudStack cloud sevice is already running, PID = $pid"
> >        return 0
> >    fi
> >
> >    echo -n "Starting CloudStack cloud service (type=$TYPE) "
> >    if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ];
> <------------------------------------------------------Does not exist.
> Seems odd!
> >    then
> >      if [ "$pid" == "" ]
> >      then
> >        (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1
& )
> >        pid=$(get_pids)
> >        echo $pid > /var/run/cloud.pid
> >      fi
> >      _success
> >    else
> >      _failure
> >    fi
> >    echo
> >    echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request
> > }
> >
> > I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder
> exists; however, the script then looks for the file
> /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also looks
> is supposed to start the script run.sh but that also doesn't exist. This
> seems like a problem to me.
> > Here you can see step through when I try to start the cloud service:
> >
> > sh -x /etc/init.d/cloud start
> > + ENABLED=0
> > + [ -e /etc/default/cloud ]
> > + . /etc/default/cloud
> > + ENABLED=0
> > + cat /var/cache/cloud/cmdline
> > + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75
> eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com
> cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr
> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21
> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
> host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> > + [ ! -z ]
> > + LOG_FILE=/dev/null
> > + TYPE=router
> > + cut -d= -f1
> > + echo template=domP
> > + FIRSTPATTERN=template
> > + cut -d= -f1
> > + echo name=r-4-VM
> > + FIRSTPATTERN=name
> > + cut -d= -f1
> > + echo eth0ip=10.70.116.75
> > + FIRSTPATTERN=eth0ip
> > + cut -d= -f1
> > + echo eth0mask=255.255.255.0
> > + FIRSTPATTERN=eth0mask
> > + cut -d= -f1
> > + echo gateway=10.70.116.1
> > + FIRSTPATTERN=gateway
> > + cut -d= -f1
> > + echo domain=vit.vertitechit.com
> > + FIRSTPATTERN=domain
> > + cut -d= -f1
> > + echo cidrsize=24
> > + FIRSTPATTERN=cidrsize
> > + cut -d= -f1
> > + echo dhcprange=10.70.116.1
> > + FIRSTPATTERN=dhcprange
> > + cut -d= -f1
> > + echo eth1ip=0.0.0.0
> > + FIRSTPATTERN=eth1ip
> > + cut -d= -f1
> > + echo eth1mask=0.0.0.0
> > + FIRSTPATTERN=eth1mask
> > + cut -d= -f1
> > + echo mgmtcidr=10.70.110.0/24
> > + FIRSTPATTERN=mgmtcidr
> > + cut -d= -f1
> > + echo localgw=10.70.116.1
> > + FIRSTPATTERN=localgw
> > + cut -d= -f1
> > + echo sshonguest=true
> > + FIRSTPATTERN=sshonguest
> > + cut -d= -f1
> > + echo type=dhcpsrvr
> > + FIRSTPATTERN=type
> > + cut -d= -f2
> > + echo type=dhcpsrvr
> > + TYPE=dhcpsrvr
> > + cut -d= -f1
> > + echo disable_rp_filter=true
> > + FIRSTPATTERN=disable_rp_filter
> > + cut -d= -f1
> > + echo extra_pubnics=2
> > + FIRSTPATTERN=extra_pubnics
> > + cut -d= -f1
> > + echo dns1=10.70.10.21
> > + FIRSTPATTERN=dns1
> > + cut -d= -f1
> > + echo
> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
> > + FIRSTPATTERN=baremetalnotificationsecuritykey
> > + cut -d= -f1
> > + echo
> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
> > + FIRSTPATTERN=baremetalnotificationapikey
> > + cut -d= -f1
> > + echo host=10.70.110.101
> > + FIRSTPATTERN=host
> > + cut -d= -f1
> > + echo port=8080
> > + FIRSTPATTERN=port
> > + cut -d= -f1
> > + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> > + FIRSTPATTERN=nic_macs
> > + [ -f /etc/init.d/functions ]
> > + [ -f ./lib/lsb/init-functions ]
> > + RETVAL=0
> > + CLOUDSTACK_HOME=/usr/local/cloud
> > + [ -f /usr/local/cloud/systemvm/utils.sh ]
> > + _failure
> > + [ -f /etc/init.d/functions ]
> > + echo Failed
> > Failed
> > + [ 0 != 0 ]
> > + exit 0
> >
> > Thoughts?
> >
> > Jacob Seeley
> > Sr. Infrastructure Engineer
> > VertitechIT
> > 413-268-1631
> >
> > www.vertitechit.com
> >
> > -----Original Message-----
> > From: ilya [mailto:ilya.mailing.lists@gmail.com]
> > Sent: Wednesday, July 27, 2016 8:43 PM
> > To: users@cloudstack.apache.org
> > Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> >
> > Hi Jacob
> >
> > I gave this a second read - if your issue is Router VM in starting mode
> > - but not started - it means cloudstack agent on routerVM cannot talk to
> management server on 8250 over POD network.
> >
> > Another reason would be an issue of hypervisor accessing the NFS mount
> used for secondary storage.
> >
> > Use console of vCenter to see what is happening on router vm. You can
> login locally with root/password and see the content of /var/log/cloud.out
> file, paste it on pastebin - if it makes no sense to you...
> >
> > you can also run /etc/init.d/cloud stop and start.. that will give you a
> fresh start on logs..
> >
> > also, confirm that management server can talk to VR on POD IP
> > (management) on port 3922..
> >
> > Regards
> > ilya
> >
> > On 7/27/16 9:34 AM, Jacob Seeley wrote:
> >> ilya,
> >>
> >> Here are the contents of the secondary storage:
> >>
> >> .
> >> ./template
> >> ./template/tmpl
> >> ./template/tmpl/1
> >> ./template/tmpl/1/8
> >> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
> >> ./template/tmpl/1/8/template.properties
> >> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
> >> are.ovf
> >> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
> >> are-disk3.vmdk
> >> ./template/tmpl/1/7
> >> ./template/tmpl/1/7/template.properties
> >> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
> >> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
> >> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
> >> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
> >> ./systemvm
> >> ./systemvm/systemvm-4.8.0.1.iso
> >> ./systemvm/.lck-bf162a0100000000
> >> ./snapshots
> >> ./volumes
> >>
> >> I've noticed that both the Secondary Storage VM and Console Proxy VM
> mount this ISO and as stated before, they come up just fine.
> >>
> >> Regards,
> >>
> >> Jacob Seeley
> >> Sr. Infrastructure Engineer
> >> VertitechIT
> >> 413-268-1631
> >>
> >> www.vertitechit.com
> >>
> >> -----Original Message-----
> >> From: ilya [mailto:ilya.mailing.lists@gmail.com]
> >> Sent: Wednesday, July 27, 2016 3:22 AM
> >> To: users@cloudstack.apache.org
> >> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> >>
> >> Jacob
> >>
> >> The upgrade usually occurs though systemvm.iso - that is generated by
> cloudstack on the first start.
> >>
> >> Please show the content of your secondary store specifically
> >>
> >> /mnt/[secondary-storage]/systemvm
> >>
> >> Regards
> >> ilya
> >>
> >> On 7/25/16 11:19 AM, Jacob Seeley wrote:
> >>> Here is a pastebin snippet the management-server.log -
> >>> http://pastebin.com/GCLm53Gz
> >>>
> >>> Hopefully the relevant data is in there.
> >>>
> >>> I made sure to start from scratch for this example. Everything from
> the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is
> fresh. I deployed a new instance in CloudStack, a VM internally named
> i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to
> deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
> >>>
> >>> Thank you,
> >>>
> >>> Jacob Seeley
> >>> Sr. Infrastructure Engineer
> >>> VertitechIT
> >>> 413-268-1631
> >>>
> >>> www.vertitechit.com
> >>>
> >>> -----Original Message-----
> >>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
> >>> Sent: Monday, July 25, 2016 1:37 AM
> >>> To: users@cloudstack.apache.org
> >>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> >>>
> >>> please upload the logs in the issue.
> >>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <darrentang.dt@gmail.com>
> wrote:
> >>>>
> >>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
> >>>>
> >>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <glenn.wagner@shapeblue.com>:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> What template are you using to start your first VM? - the default
> >>>>> vmware template?
> >>>>> If you look in vcenter , what does the console show you ?
> >>>>>
> >>>>>
> >>>>> Glenn
> >>>>>
> >>>>>
> >>>>>
> >>>>> glenn.wagner@shapeblue.com
> >>>>> www.shapeblue.com
> >>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town
> >>>>> 7130South Africa @shapeblue
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Pascal R. [mailto:repa182@gmail.com]
> >>>>> Sent: Monday, 04 July 2016 1:26 PM
> >>>>> To: users@cloudstack.apache.org
> >>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
> >>>>>
> >>>>> hi,
> >>>>>
> >>>>> we have a CS4.8 deployment with VMWare 5.5.
> >>>>>
> >>>>> When trying to launch the first VM, the VS is created. VS starts
> >>>>> up, but in CS, it stuck with "starting" state.
> >>>>>
> >>>>> i can't find any usefull information in the logs.
> >>>>>
> >>>>> any hint?
> >>>>>
> >>>
> >>>
> >>>
> >>>
> >>> DISCLAIMER
> >>> ==========
> >>> This e-mail may contain privileged and confidential information which
> is the property of Accelerite, a Persistent Systems business. It is
> intended only for the use of the individual or entity to which it is
> addressed. If you are not the intended recipient, you are not authorized to
> read, retain, copy, print, distribute or use this message. If you have
> received this communication in error, please notify the sender and delete
> all copies of this message. Accelerite, a Persistent Systems business does
> not accept any liability for virus infected mails.
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message