cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joris van Lieshout (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CLOUDSTACK-6023) Non windows instances are created on XenServer with a vcpu-max above supported xenserver limits
Date Wed, 05 Feb 2014 09:04:12 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891934#comment-13891934
] 

Joris van Lieshout edited comment on CLOUDSTACK-6023 at 2/5/14 9:03 AM:
------------------------------------------------------------------------

Hi Hrikrishna,

We came to this conclusion by using tcpdump to capture the POST that got returned with a http
500 error from the pool master. This post, which exceeded the 300k limit of xapi rpc, contained
for each vm the stats for each of the 32 vpcus (even though the instances where just using
1 vcpu) thus making this post exceed the 300K limit. We are encountering this issue on a host
running just 59 instances (inc 36 router vms that use just 1 vcpu but have a vcpumax of 32).


My suggestion to resolve this issue would be to make the vcpu-max a configurable variable
of a service/compute offering with a default of vcpusmax=vcpus unless otherwise configured
in the offering.

in addition I do wonder why there is is a descrepency between the XenServer Configuration
Limits documentation and the documents you are refering to. In the end we are actively experiencing
this issue. I've attached a screen print of xentop on one of our xenserver 6.0.2 host with
this issue.

If it will helps I can attach the packet capture with the post?


was (Author: jvanlieshout@schubergphilis.com):
Hi Hrikrishna,

We came to this conclusion by using tcpdump to capture the POST that got returned with a http
500 error from the pool master. This post, which exceeded the 300k limit of xapi rpc, contained
for each vm the stats for each of the 32 vpcus (even though the instances where just using
1 vcpu) thus making this post exceed the 300K limit. We are encountering this issue on a host
running just 59 instances (inc 36 router vms that use just 1 vcpu but have a vcpumax of 32).


My suggestion to resolve this issue would be to make the vcpu-max a configurable variable
of a service/compute offering with a default of vcpusmax=vcpus unless otherwise configured
in the offering.

in addition I do wonder why there is is a descrepency between the XenServer Configuration
Limits documentation and the documents you are refering to. In the end we are actively experiencing
this issue. I've attached a screen print of xentop on one of our xenserver 6.0.2 host with
this issue.


> Non windows instances are created on XenServer with a vcpu-max above supported xenserver
limits
> -----------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-6023
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6023
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: XenServer
>    Affects Versions: Future, 4.2.1, 4.3.0
>            Reporter: Joris van Lieshout
>            Priority: Blocker
>         Attachments: xentop.png
>
>
> CitrixResourceBase.java contains a hardcoded value for vcpusmax for non windows instances:
> if (guestOsTypeName.toLowerCase().contains("windows")) {
>             vmr.VCPUsMax = (long) vmSpec.getCpus();
>         } else {
>             vmr.VCPUsMax = 32L;
>         }
> For all currently available versions of XenServer the limit is 16vcpus:
> http://support.citrix.com/servlet/KbServlet/download/28909-102-664115/XenServer-6.0-Configuration-Limits.pdf
> http://support.citrix.com/servlet/KbServlet/download/32312-102-704653/CTX134789%20-%20XenServer%206.1.0_Configuration%20Limits.pdf
> http://support.citrix.com/servlet/KbServlet/download/34966-102-706122/CTX137837_XenServer%206_2_0_Configuration%20Limits.pdf
> In addition there seems to be a limit to the total amount of assigned vpcus on a XenServer.
> The impact of this bug is that xapi becomes unstable and keeps losing it's master_connection
because the POST to the /remote_db_access is bigger then it's limit of 200K. This basically
renders a pool slave unmanageable. 
> If you would look at the running instances using xentop you will see hosts reporting
with 32 vcpus
> Below the relevant portion of the xensource.log that shows the effect of the bug:
> [20140204T13:52:17.264Z|debug|xenserverhost1|144 inet-RPC|host.call_plugin R:e58e985539ab|master_connection]
stunnel: Using commandline: /usr/sbin/stunnel -fd f3b8bb12-4e03-b47a-0dc5-85ad5aef79e6
> [20140204T13:52:17.269Z|debug|xenserverhost1|144 inet-RPC|host.call_plugin R:e58e985539ab|master_connection]
stunnel: stunnel has pidty: (FEFork (43,30540))
> [20140204T13:52:17.269Z|debug|xenserverhost1|144 inet-RPC|host.call_plugin R:e58e985539ab|master_connection]
stunnel: stunnel start
> [20140204T13:52:17.269Z| info|xenserverhost1|144 inet-RPC|host.call_plugin R:e58e985539ab|master_connection]
stunnel connected pid=30540 fd=40
> [20140204T13:52:17.346Z|error|xenserverhost1|144 inet-RPC|host.call_plugin R:e58e985539ab|master_connection]
Received HTTP error 500 ({ method = POST; uri = /remote_db_access; query = [  ]; content_length
= [ 315932 ]; transfer encoding = ; version = 1.1; cookie = [ pool_secret=386bbf39-8710-4d2d-f452-9725d79c2393/aa7bcda9-8ebb-0cef-bb77-c6b496c5d859/1f928d82-7a20-9117-dd30-f96c7349b16e
]; task = ; subtask_of = ; content-type = ; user_agent = xapi/1.9 }) from master. This suggests
our master address is wrong. Sleeping for 60s and then restarting.
> [20140204T13:53:18.620Z|error|xenserverhost1|10|dom0 networking update D:5c5376f0da6c|master_connection]
Caught Master_connection.Goto_handler
> [20140204T13:53:18.620Z|debug|xenserverhost1|10|dom0 networking update D:5c5376f0da6c|master_connection]
Connection to master died. I will continue to retry indefinitely (supressing future logging
of this message).
> [20140204T13:53:18.620Z|error|xenserverhost1|10|dom0 networking update D:5c5376f0da6c|master_connection]
Connection to master died. I will continue to retry indefinitely (supressing future logging
of this message).
> [20140204T13:53:18.620Z|debug|xenserverhost1|10|dom0 networking update D:5c5376f0da6c|master_connection]
Sleeping 2.000000 seconds before retrying master connection...
> [20140204T13:53:20.627Z|debug|xenserverhost1|10|dom0 networking update D:5c5376f0da6c|master_connection]
stunnel: Using commandline: /usr/sbin/stunnel -fd 3c8aed8e-1fce-be7c-09f8-b45cdc40a1f5
> [20140204T13:53:20.632Z|debug|xenserverhost1|10|dom0 networking update D:5c5376f0da6c|master_connection]
stunnel: stunnel has pidty: (FEFork (23,31207))
> [20140204T13:53:20.632Z|debug|xenserverhost1|10|dom0 networking update D:5c5376f0da6c|master_connection]
stunnel: stunnel start
> [20140204T13:53:20.632Z| info|xenserverhost1|10|dom0 networking update D:5c5376f0da6c|master_connection]
stunnel connected pid=31207 fd=20
> [20140204T13:53:28.874Z|error|xenserverhost1|4 unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection]
Caught Master_connection.Goto_handler
> [20140204T13:53:28.874Z|debug|xenserverhost1|4 unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection]
Connection to master died. I will continue to retry indefinitely (supressing future logging
of this message).
> [20140204T13:53:28.874Z|error|xenserverhost1|4 unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection]
Connection to master died. I will continue to retry indefinitely (supressing future logging
of this message).
> [20140204T13:53:28.875Z|debug|xenserverhost1|4 unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection]
Sleeping 2.000000 seconds before retrying master connection...
> [20140204T13:53:30.887Z|debug|xenserverhost1|4 unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection]
stunnel: Using commandline: /usr/sbin/stunnel -fd 665b8c15-8119-78a7-1888-cde60b2108dc
> [20140204T13:53:30.892Z|debug|xenserverhost1|4 unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection]
stunnel: stunnel has pidty: (FEFork (25,31514))
> [20140204T13:53:30.892Z|debug|xenserverhost1|4 unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection]
stunnel: stunnel start
> [20140204T13:53:30.892Z| info|xenserverhost1|4 unix-RPC|session.login_with_password D:2e7664ad69ed|master_connection]
stunnel connected pid=31514 fd=22
> [20140204T13:54:31.472Z|error|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Caught Unix.Unix_error(31, "write", "")
> [20140204T13:54:31.472Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Connection to master died. I will continue to retry indefinitely (supressing future logging
of this message).
> [20140204T13:54:31.477Z|error|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Connection to master died. I will continue to retry indefinitely (supressing future logging
of this message).
> [20140204T13:54:31.477Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Sleeping 2.000000 seconds before retrying master connection...
> [20140204T13:54:33.488Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: Using commandline: /usr/sbin/stunnel -fd f5df840d-8ac0-39fd-050f-bfa23a96c148
> [20140204T13:54:33.493Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: stunnel has pidty: (FEFork (28,2788))
> [20140204T13:54:33.493Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: stunnel start
> [20140204T13:54:33.493Z| info|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel connected pid=2788 fd=24
> [20140204T13:54:33.572Z|error|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Caught Unix.Unix_error(31, "write", "")
> [20140204T13:54:33.572Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Sleeping 4.000000 seconds before retrying master connection...
> [20140204T13:54:37.578Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: Using commandline: /usr/sbin/stunnel -fd bcc34b6e-20cd-933c-7375-941d53884184
> [20140204T13:54:37.583Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: stunnel has pidty: (FEFork (31,2808))
> [20140204T13:54:37.584Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: stunnel start
> [20140204T13:54:37.584Z| info|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel connected pid=2808 fd=26
> [20140204T13:54:37.667Z|error|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Caught Unix.Unix_error(31, "write", "")
> [20140204T13:54:37.667Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Sleeping 8.000000 seconds before retrying master connection...
> [20140204T13:54:45.679Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: Using commandline: /usr/sbin/stunnel -fd 83e7a6c7-3482-8bb9-3275-b537fc695bd6
> [20140204T13:54:45.683Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: stunnel has pidty: (FEFork (30,2919))
> [20140204T13:54:45.683Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: stunnel start
> [20140204T13:54:45.683Z| info|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel connected pid=2919 fd=25
> [20140204T13:54:45.768Z|error|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Caught Unix.Unix_error(31, "write", "")
> [20140204T13:54:45.768Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Sleeping 16.000000 seconds before retrying master connection...
> [20140204T13:55:01.789Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: Using commandline: /usr/sbin/stunnel -fd abe83182-4ce5-0681-2c68-827dbbd95e94
> [20140204T13:55:01.794Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: stunnel has pidty: (FEFork (32,3022))
> [20140204T13:55:01.794Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: stunnel start
> [20140204T13:55:01.794Z| info|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel connected pid=3022 fd=28
> [20140204T13:55:02.143Z|error|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Caught Unix.Unix_error(31, "write", "")
> [20140204T13:55:02.143Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Sleeping 32.000000 seconds before retrying master connection...
> [20140204T13:55:34.179Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: Using commandline: /usr/sbin/stunnel -fd 00895b5f-b30c-0c3a-32ae-758993dcd791
> [20140204T13:55:34.184Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: stunnel has pidty: (FEFork (37,3387))
> [20140204T13:55:34.184Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel: stunnel start
> [20140204T13:55:34.184Z| info|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
stunnel connected pid=3387 fd=33
> [20140204T13:55:34.266Z|error|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Caught Unix.Unix_error(31, "write", "")
> [20140204T13:55:34.267Z|debug|xenserverhost1|232 inet-RPC|host.call_plugin R:4d3007755c69|master_connection]
Sleeping 64.000000 seconds before retrying master connection...



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message