cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmad Emneina <aemne...@gmail.com>
Subject Re: Rebuilding management server
Date Tue, 16 Jul 2013 18:20:31 GMT
can you check if your hosts can connect back to the management server
(ping, telnet 22,443)? there might be some firewall rules in place, or
routing issues, preventing this.


On Tue, Jul 16, 2013 at 9:01 AM, Leeno Jose.P.A <leenojos@gmail.com> wrote:

> CS startup logs,
>
> 2013-07-16 11:25:30,702 INFO  [utils.component.ComponentContext]
> (Timer-1:null) Starting
> com.cloud.network.guru.NiciraNvpGuestNetworkGuru_EnhancerByCloudStack_1f6b4bb6
> 2013-07-16 11:25:30,702 INFO  [utils.component.ComponentContext]
> (Timer-1:null) Starting
> com.cloud.server.ManagementServerImpl_EnhancerByCloudStack_d54e1bb1
> 2013-07-16 11:25:30,702 INFO  [cloud.server.ManagementServerImpl]
> (Timer-1:null) Startup CloudStack management server...
> 2013-07-16 11:25:30,707 INFO
> [cloud.cluster.ClusterServiceServletContainer] (Thread-18:null) Cluster
> service servlet container listening on port 9090
> 2013-07-16 11:25:31,832 DEBUG [utils.db.ConnectionConcierge]
> (Cluster-Heartbeat-1:null) Registering a database connection for
> ClusterManagerHeartBeat2
> 2013-07-16 11:25:31,845 INFO  [cloud.cluster.ClusterManagerImpl]
> (Cluster-Heartbeat-1:null) We are good, no orphan management server msid in
> host table is found
> 2013-07-16 11:25:31,845 INFO  [cloud.cluster.ClusterManagerImpl]
> (Cluster-Heartbeat-1:null) Found 1 inactive management server node based on
> timestamp
> 2013-07-16 11:25:31,846 INFO  [cloud.cluster.ClusterManagerImpl]
> (Cluster-Heartbeat-1:null) management server node msid: 130602634328, name:
> cstagcms, service ip: 192.168.10.251, version: 4.1.0
> 2013-07-16 11:25:31,846 INFO  [cloud.cluster.ClusterManagerImpl]
> (Cluster-Heartbeat-1:null) Trying to connect to 192.168.10.251
> 2013-07-16 11:25:31,860 DEBUG [cloud.cluster.ClusterManagerImpl]
> (Cluster-Heartbeat-1:null) Detected management node joined, id:2,
> nodeIP:192.168.10.251
> 2013-07-16 11:25:33,348 DEBUG [cloud.cluster.ClusterManagerImpl]
> (Cluster-Notification-1:null) Notify management server node join to
> listeners.
> 2013-07-16 11:25:33,349 DEBUG [cloud.cluster.ClusterManagerImpl]
> (Cluster-Notification-1:null) Joining node, IP: 192.168.10.251, msid:
> 81375086018793
> 2013-07-16 11:25:33,350 DEBUG [cloud.alert.ClusterAlertAdapter]
> (Cluster-Notification-1:null) Receive cluster alert, EventArgs:
> com.cloud.cluster.ClusterNodeJoinEventArgs
> 2013-07-16 11:25:33,350 DEBUG [cloud.alert.ClusterAlertAdapter]
> (Cluster-Notification-1:null) Handle cluster node join alert, joined node:
> 192.168.10.251, msidL: 81375086018793
> 2013-07-16 11:25:33,350 DEBUG [cloud.alert.ClusterAlertAdapter]
> (Cluster-Notification-1:null) Management server node 192.168.10.251 is up,
> send alert
> 2013-07-16 11:25:33,361 WARN  [cloud.cluster.ClusterManagerImpl]
> (Cluster-Notification-1:null) Notifying management server join event took
> 12 ms
> 2013-07-16 11:25:45,450 DEBUG [cloud.server.StatsCollector]
> (StatsCollector-2:null) HostStatsCollector is running...
> 2013-07-16 11:25:45,452 DEBUG [cloud.server.StatsCollector]
> (StatsCollector-1:null) VmStatsCollector is running...
> 2013-07-16 11:25:45,467 DEBUG [cloud.server.StatsCollector]
> (StatsCollector-3:null) StorageCollector is running...
> 2013-07-16 11:25:45,498 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (StatsCollector-2:null) create forwarding ClusteredAgentAttache for 39
> 2013-07-16 11:25:45,491 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (StatsCollector-3:null) create forwarding ClusteredAgentAttache for 50
> 2013-07-16 11:25:45,751 INFO  [agent.manager.ClusteredAgentManagerImpl]
> (StatsCollector-3:null) SSL: Handshake done
> 2013-07-16 11:25:45,752 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (StatsCollector-3:null) Connection to peer opened: 130602634328, ip:
> 192.168.10.251
> 2013-07-16 11:25:45,757 DEBUG [agent.manager.ClusteredAgentAttache]
> (StatsCollector-2:null) Seq 39-282525697: Forwarding null to 130602634328
> 2013-07-16 11:25:45,758 DEBUG [agent.manager.ClusteredAgentAttache]
> (StatsCollector-3:null) Seq 50-1962541057: Forwarding null to 130602634328
> 2013-07-16 11:25:45,804 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-2:null) Seq 39-282525697: Routing from 81375086018793
> 2013-07-16 11:25:45,804 DEBUG [agent.manager.ClusteredAgentAttache]
> (AgentManager-Handler-2:null) Seq 39-282525697: Link is closed
> 2013-07-16 11:25:45,806 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentManager-Handler-2:null) Seq 39-282525697: MgmtId 81375086018793: Req:
> Resource [Host:39] is unreachable: Host 39: Link is closed
>
>
> Thanks
> Leeno
>
>
> On Tue, Jul 16, 2013 at 6:10 PM, Leeno Jose.P.A <leenojos@gmail.com>wrote:
>
>> Hi Todd,
>>
>> Thanks for the help.
>>
>> I executed the steps as you mentioned above but that did not help. Still
>> I get same error message. But I can do ping, telnet ports 22, 80 and 443 on
>> XS hosts from CS.
>>
>> Thanks
>> Leeno
>>
>>
>> On Tue, Jul 16, 2013 at 5:12 PM, Todd Pigram <todd@toddpigram.com> wrote:
>>
>>> Did you remove the Tags on each XenServer host prior to starting?
>>>
>>> Management Controller Failure and Replacement
>>>
>>> <https://cwiki.apache.org/confluence/pages/editpage.action?pageId=30755366>
>>>  Edit Page<https://cwiki.apache.org/confluence/pages/editpage.action?pageId=30755366>
>>>    <https://cwiki.apache.org/confluence/pages/listpages.action?key=CLOUDSTACK>
>>>  Browse Space<https://cwiki.apache.org/confluence/pages/listpages.action?key=CLOUDSTACK>
>>>    <https://cwiki.apache.org/confluence/pages/createpage.action?spaceKey=CLOUDSTACK&fromPageId=30755366>
>>>  Add Page<https://cwiki.apache.org/confluence/pages/createpage.action?spaceKey=CLOUDSTACK&fromPageId=30755366>
>>>    <https://cwiki.apache.org/confluence/pages/createblogpost.action?spaceKey=CLOUDSTACK&fromPageId=30755366>
>>>  Add News<https://cwiki.apache.org/confluence/pages/createblogpost.action?spaceKey=CLOUDSTACK&fromPageId=30755366>
>>>
>>> In setting up your cloud, you should have a backup routine for your
>>> controller. The most important item to back up is the MySQL databases that
>>> Cloudstack uses. A suitable backup script is attached to this page. In the
>>> even of a cloud management controller failure, the steps to replace the
>>> controller with a new one are:
>>>
>>> These instructions assume your cluster is Xenserver - Contributors
>>> using other Hypervisor OSs, please contribute.
>>>
>>>    1. Setup new management server hardware
>>>    2. Install your OS
>>>    3. Install Cloudstack, up to and including the "Install Database
>>>    step"
>>>    4. Import your database backup
>>>    5. In Xencenter, connect to your Cloudstack host pool.
>>>    6. On each host, remove the tags on Host > General Tab > Tags by
>>>    editing the tags and un-checking each one.
>>>    7. On the management controller, start Cloudstack
>>>       1. service cloud-management start
>>>    8. the new cloud management controller will connect to each host in
>>>    the database and push out new tags and keys to each host in the pool.
>>>
>>>
>>> On Jul 16, 2013, at 1:13 AM, Leeno Jose.P.A <leenojos@gmail.com> wrote:
>>>
>>> After restoring the old database dump to new installation. CS is unable
>>> to
>>> contact Xenserver hosts. I getting following errors in
>>> mamangement-server.log,
>>>
>>>
>>> 2013-07-15 11:57:49,646 DEBUG [agent.manager.ClusteredAgentManagerImpl]
>>> (StatsCollector-1:null) Connection to peer opened: 130602634328, ip:
>>> 192.168.10.251
>>> 2013-07-15 11:57:49,652 DEBUG [agent.manager.ClusteredAgentAttache]
>>> (StatsCollector-2:null) Seq 50-185008129: Forwarding null to 130602634328
>>> 2013-07-15 11:57:49,662 DEBUG [agent.manager.ClusteredAgentAttache]
>>> (StatsCollector-1:null) Seq 39-1272840193: Forwarding null to
>>> 130602634328
>>> 2013-07-15 11:57:49,699 DEBUG [agent.manager.ClusteredAgentAttache]
>>> (AgentManager-Handler-2:null) Seq 50-185008129: Routing from
>>> 81375086018793
>>> 2013-07-15 11:57:49,699 DEBUG [agent.manager.ClusteredAgentAttache]
>>> (AgentManager-Handler-2:null) Seq 50-185008129: Link is closed
>>> 2013-07-15 11:57:49,699 DEBUG [agent.manager.ClusteredAgentAttache]
>>> (AgentManager-Handler-3:null) Seq 39-1272840193: Routing from
>>> 81375086018793
>>> 2013-07-15 11:57:49,700 DEBUG [agent.manager.ClusteredAgentAttache]
>>> (AgentManager-Handler-3:null) Seq 39-1272840193: Link is closed
>>> 2013-07-15 11:57:49,700 DEBUG [agent.manager.ClusteredAgentManagerImpl]
>>> (AgentManager-Handler-3:null) Seq 39-1272840193: MgmtId 81375086018793:
>>> Req: Resource [Host:39] is unreachable: Host 39: Link is closed
>>>
>>>
>>> 2013-07-15 11:57:49,861 DEBUG [agent.manager.ClusteredAgentManagerImpl]
>>> (AgentManager-Handler-8:null) Seq 39--1: MgmtId 81375086018793: Req:
>>> Cancel
>>> request received
>>> 2013-07-15 11:57:49,861 DEBUG [agent.manager.AgentAttache]
>>> (AgentManager-Handler-8:null) Seq 39-1272840194: Cancelling.
>>> 2013-07-15 11:57:49,861 DEBUG [agent.manager.AgentAttache]
>>> (StatsCollector-2:null) Seq 39-1272840194: Waiting some more time because
>>> this is the current command
>>> 2013-07-15 11:57:49,862 DEBUG [agent.manager.AgentAttache]
>>> (StatsCollector-2:null) Seq 39-1272840194: Waiting some more time because
>>> this is the current command
>>> 2013-07-15 11:57:49,862 INFO  [utils.exception.CSExceptionErrorCode]
>>> (StatsCollector-2:null) Could not find exception:
>>> com.cloud.exception.OperationTimedoutException in error code list for
>>> exceptions
>>> 2013-07-15 11:57:49,862 WARN  [agent.manager.AgentAttache]
>>> (StatsCollector-2:null) Seq 39-1272840194: Timed out on null
>>> 2013-07-15 11:57:49,862 DEBUG [agent.manager.AgentAttache]
>>> (StatsCollector-2:null) Seq 39-1272840194: Cancelling.
>>> 2013-07-15 11:57:49,863 DEBUG [cloud.storage.StorageManagerImpl]
>>> (StatsCollector-2:null) Unable to send storage pool command to
>>> Pool[210|NetworkFilesystem] via 39
>>> com.cloud.exception.OperationTimedoutException: Commands 1272840194 to
>>> Host
>>> 39 timed out after 3600
>>>        at
>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>        at
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
>>>        at
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
>>>        at
>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
>>>        at
>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>>>        at
>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>>>        at
>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>>>        at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>        at
>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>>        at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>        at
>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>>>        at
>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>>>        at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>        at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>        at java.lang.Thread.run(Thread.java:679)
>>> 2013-07-15 11:57:49,863 INFO  [cloud.server.StatsCollector]
>>> (StatsCollector-2:null) Unable to reach Pool[210|NetworkFilesystem]
>>> com.cloud.exception.StorageUnavailableException: Resource
>>> [StoragePool:210]
>>> is unreachable: Unable to send command to the pool
>>>        at
>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
>>>        at
>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>>>        at
>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>>>        at
>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>>>        at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>        at
>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>>        at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>        at
>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>>>        at
>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>>>        at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>        at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>        at java.lang.Thread.run(Thread.java:679)
>>>
>>>
>>> Thanks
>>> Leeno
>>>
>>>
>>> On Tue, Jul 16, 2013 at 10:21 AM, Leeno Jose.P.A <leenojos@gmail.com>
>>> wrote:
>>>
>>> This is a dev box. We are planning a HA enabled environment for prod
>>> setup. Thanks Geoff.
>>>
>>>
>>> On Tue, Jul 16, 2013 at 12:11 AM, Geoff Higginbottom <
>>> geoff.higginbottom@shapeblue.com> wrote:
>>>
>>> Hi Leeno,
>>>
>>> It theory that should work, but obviously you will lose all changes made
>>> since the dump was taken.  If any new VMs have been created, they will
>>> get
>>> purged by the system etc.
>>>
>>> I would highly recommend splitting the DB and the Management Server, and
>>> if possible add a 2nd instance of each.
>>>
>>> Regards
>>>
>>> Geoff Higginbottom
>>>
>>> D: +44 20 3603 0542 | S: +44 20 3603 0540 | M: +447968161581
>>>
>>> geoff.higginbottom@shapeblue.com
>>>
>>> -----Original Message-----
>>> From: Leeno Jose.P.A [mailto:leenojos@gmail.com]
>>> Sent: 15 July 2013 18:46
>>> To: users@cloudstack.apache.org
>>> Subject: Re: Rebuilding management server
>>>
>>> Hi Geoff,
>>>
>>> 1. I have only one management server.
>>> 2. Management server is not functioning now but 'cloud' database dump is
>>> available in backup. CS version was 4.1.0 Hosts were Xenserver 6.1.0 3.
>>> DB
>>> server was on same machine where management server installed.
>>>
>>> Now I am planning to do a fresh install of CS 4.1.0 and restore cloud
>>> database with old installation dump, which is available in backup. Will
>>> it
>>> work?
>>>
>>> Thanks
>>> Leeno
>>>
>>>
>>> On Mon, Jul 15, 2013 at 9:56 PM, Geoff Higginbottom <
>>> geoff.higginbottom@shapeblue.com> wrote:
>>>
>>> The Management Servers are 'Stateless' so as Chip points out, it's the
>>> DB that stores all the info.
>>>
>>> How you actually go about it depends on your current setup.
>>>
>>> 1. How many management servers do you currently have?
>>> 2. Are the original Management Server(s) still functioning, or are
>>> they down?
>>> 3. Is DB on a separate server, or the same as the Management Server?
>>>
>>> Regards
>>>
>>> Geoff Higginbottom
>>>
>>> D: +44 20 3603 0542 | S: +44 20 3603 0540 | M: +447968161581
>>>
>>> geoff.higginbottom@shapeblue.com
>>>
>>> -----Original Message-----
>>> From: Chip Childers [mailto:chip.childers@sungard.com]
>>> Sent: 15 July 2013 15:50
>>> To: users@cloudstack.apache.org
>>> Subject: Re: Rebuilding management server
>>>
>>> On Mon, Jul 15, 2013 at 03:19:42PM +0530, Leeno Jose.P.A wrote:
>>>
>>> Hi Users,
>>>
>>> Has anyone tried to rebuild management server with Xenserver hosts?
>>> If yes, could you please share experience?
>>>
>>>
>>> --
>>> Leeno Jose .P.A
>>>
>>>
>>> I have not, but one of the most critical aspects of this is to ensure
>>> that your database is retained.
>>>
>>> This email and any attachments to it may be confidential and are
>>> intended solely for the use of the individual to whom it is addressed.
>>> Any views or opinions expressed are solely those of the author and do
>>> not necessarily represent those of Shape Blue Ltd or related
>>> companies. If you are not the intended recipient of this email, you
>>> must neither take any action based upon its contents, nor copy or show
>>> it to anyone. Please contact the sender if you believe you have
>>> received this email in error. Shape Blue Ltd is a company incorporated
>>> in England & Wales. ShapeBlue Services India LLP is operated under
>>> license from Shape Blue Ltd. ShapeBlue is a registered trademark.
>>>
>>>
>>>
>>>
>>> --
>>> Leeno Jose .P.A
>>> This email and any attachments to it may be confidential and are intended
>>> solely for the use of the individual to whom it is addressed. Any views
>>> or
>>> opinions expressed are solely those of the author and do not necessarily
>>> represent those of Shape Blue Ltd or related companies. If you are not
>>> the
>>> intended recipient of this email, you must neither take any action based
>>> upon its contents, nor copy or show it to anyone. Please contact the
>>> sender
>>> if you believe you have received this email in error. Shape Blue Ltd is a
>>> company incorporated in England & Wales. ShapeBlue Services India LLP is
>>> operated under license from Shape Blue Ltd. ShapeBlue is a registered
>>> trademark.
>>>
>>>
>>>
>>>
>>> --
>>> Leeno Jose .P.A
>>>
>>>
>>>
>>>
>>> --
>>> Leeno Jose .P.A
>>>
>>>
>>>
>>>
>>>
>>>
>>> Todd Pigram
>>> todd@toddpigram.com
>>>
>>>
>>>
>>
>>
>> --
>> Leeno Jose .P.A
>>
>
>
>
> --
> Leeno Jose .P.A
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message