cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Yadav <rohit.ya...@shapeblue.com>
Subject Re: CS 4.9 NIO Selector wait time PR-1601
Date Thu, 25 Aug 2016 17:44:03 GMT
Hi Martin,


Thanks for sharing. Alright, I'm not sure what's causing issue but based on the logs seems
like only KVM agents are having issues while connecting to mgmt server as I don't see any
Nio related exceptions in the management server logs.


I could not see the cloudstack-agent version in the logs, I'm assuming that they were all
upgraded to 4.9.0, and there are no conflicting jars at /usr/share/cloudstack-agent/lib.


First, can you make sure mgmt server has enough ulimit. I found that Ubuntu/Debian's init.d
script don't override this while CentOS initd/systemd script sets ulimit. On your mgmt server,
edit /etc/init.d/cloudstack-management and add ulimit -n 10240 just before the mgmt server
is started in the 'state' section (for me it was at around line #147 where it logs a message
that it's starting the cloudstack-management server).


Next, if this still does not solve the issue -- I created a special cloud-utils.jar for you
that you need to place on your mgmt server and on the KVM agents and restart the mgmt server.
This will increase verbosity of the error while reduce the Nio polling loop timeout (from
100ms to 10ms). On KVM agents, the error from the logs is that during SSL handshake inbound
connection/stream gets closed, and we want to know the exception message. Please get the jar
from here:

https://github.com/rhtyd/cloudstack/releases/tag/4.9.0-nioinbound and place them at:

/usr/share/cloudstack-agent/lib/ (on kvm host)

/usr/share/cloudstack-management/webapps/client/WEB-INF/lib/ (on mgmt server host)


Let me know what worked for you, and if it still failed can you share the mgmt server and
agent logs once again. Thanks.


Regards.

________________________________
From: martin kolly <martin.kolly@senselan.ch>
Sent: 25 August 2016 20:50:08
To: dev@cloudstack.apache.org
Subject: Re: CS 4.9 NIO Selector wait time PR-1601

Hi Rohit

We are running java version 1.7.0.111 on KVM and management server.
mgmt# java -version
java version "1.7.0_111"
kvm# java -version
java version "1.7.0_111"

We get the same error message. Attached are the logs with TRACE enabled.

"success consists of going from failure to failure without loss of enthusiasm."

regards
martin

On 08/25/2016 02:02 PM, Rohit Yadav wrote:

Hi Martin,


Thanks for sharing, on the surface there does not seem to be any issue in configuration causing
the failures. I'm personally running KVM and Ubuntu hosts based env without issues, I'm on
Ubuntu 14.04.4 (Linux bluebox 3.16.0-45-generic #60~14.04.1-Ubuntu) and java 1.7.0_79. Can
you try upgrading your JRE7 to latest (openjdk-7-jre, 7u111-2.6.7-0ubuntu0.14.04.3) on all
mgmt server and kvm hosts?


If upgrading your JRE does not help, can you increase the logging verbosity for both the agent
and management server (in /etc/cloudstack/{agent, management} there would be a log4j file,
edit that and replace DEBUG/INFO with TRACE for class/keys com.cloud and org.apache.cloudstack)
and re-share logs when the failures occur? I want to see what additional information we can
get from logs when it tries to connect to host 10.100.12.10 on port: 8250.


Regards.

________________________________
From: martin kolly <martin.kolly@senselan.ch><mailto:martin.kolly@senselan.ch>
Sent: 25 August 2016 17:11:06
To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>
Subject: Re: CS 4.9 NIO Selector wait time PR-1601


@Simon: We have one management server with local DB. KVMs connect
directly to the management server without any security/loadbalancing
device.

Thanks
Martin

On 08/25/2016 12:41 PM, Simon Weller wrote:


Martin,

Can you provide more detail about your haproxy setup?
Are you running it on separate servers, or on the management server itself?

- Si

Simon Weller/ENA
(615) 312-6068

rohit.yadav@shapeblue.comĀ 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


-----Original Message-----
From: martin kolly [martin.kolly@senselan.ch<mailto:martin.kolly@senselan.ch>]
Received: Thursday, 25 Aug 2016, 5:04AM
To: Rohit Yadav [rohit.yadav@shapeblue.com<mailto:rohit.yadav@shapeblue.com>]; dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>
[dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>]
Subject: Re: CS 4.9 NIO Selector wait time PR-1601


thanks for your reply.

This morning we repeated the upgrade process from 4.8 to 4.9 with the
following repository:
http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/.
<http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/>
Unfortunately we run into the same issue:

/2016-08-25 09:49:00,660 INFO  [utils.nio.NioClient] (main:null)
(logid:) Connecting to 10.100.12.10:8250//
//2016-08-25 09:49:00,668 WARN  [utils.nio.Link] (main:null) (logid:)
This SSL engine was forced to close inbound due to end of stream.//
//2016-08-25 09:49:00,668 ERROR [utils.nio.NioClient] (main:null)
(logid:) SSL Handshake failed while connecting to host: 10.100.12.10
port: 8250//
//2016-08-25 09:49:00,668 ERROR [utils.nio.NioConnection] (main:null)
(logid:) Unable to initialize the threads.//
//java.io.IOException: SSL Handshake failed while connecting to host:
10.100.12.10 port: 8250//
//    at com.cloud.utils.nio.NioClient.init(NioClient.java:67)//
//    at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)//
//    at com.cloud.agent.Agent.start(Agent.java:237)//
//    at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)//
//    at
com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367)//
//    at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351)//
//    at com.cloud.agent.AgentShell.start(AgentShell.java:456)//
//    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)//
//    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)//
//    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)//
//    at java.lang.reflect.Method.invoke(Method.java:606)//
//    at
org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)//
//2016-08-25 09:49:00,669 INFO  [utils.exception.CSExceptionErrorCode]
(main:null) (logid:) Could not find exception:
com.cloud.utils.exception.NioConnectionException in error code list for
exceptions//
//2016-08-25 09:49:00,669 WARN  [cloud.agent.Agent] (main:null) (logid:)
NIO Connection Exception
com.cloud.utils.exception.NioConnectionException: SSL Handshake failed
while connecting to host: 10.100.12.10 port: 8250//
//2016-08-25 09:49:00,670 INFO  [cloud.agent.Agent] (main:null) (logid:)
Attempted to connect to the server, but received an unexpected
exception, trying again.../

*KVM Hosts:
*/# java -version
java version "1.7.0_95"
OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)

# dpkg --get-selections | grep -e 'jdk' -e 'java'
ca-certificates-java                install
java-common                    install
libcommons-daemon-java                install
openjdk-7-jre-headless:amd64            install
tzdata-java                    install

# apt-cache policy cloudstack-agent
cloudstack-agent:
  Installed: 4.9.0
  Candidate: 4.9.0
  Version table:
 *** 4.9.0 0
        500
http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages
        100 /var/lib/dpkg/status

# find /usr/share/ -name "cloud-utils*.jar"
/usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar
# md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar
a8de7306d7c80b5a73e93b83afdd119f
/usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar


/*Management Server:
*/# java -version//
//java version "1.7.0_95"//
//OpenJDK Runtime Environment (IcedTea 2.6.4)
(7u95-2.6.4-0ubuntu0.14.04.1)//
//OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)//
//
//# dpkg --get-selections | grep -e 'jdk' -e 'java'//
//ca-certificates-java                install//
//java-common                    install//
//libcommons-collections3-java            install//
//libcommons-daemon-java                install//
//libcommons-dbcp-java                install//
//libcommons-pool-java                install//
//libecj-java                    install//
//libgeronimo-jta-1.1-spec-java            install//
//libmysql-java                    install//
//libservlet2.5-java                install//
//libtomcat6-java                    install//
//openjdk-7-jre-headless:amd64            install//
//tzdata-java                    install//
//
//# apt-cache policy cloudstack-management//
//cloudstack-management://
//  Installed: 4.9.0//
//  Candidate: 4.9.0//
//  Version table://
// *** 4.9.0 0//
//        500
http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages//
//        100 /var/lib/dpkg/status///

/# find /usr/share/ -name "cloud-utils*.jar"//
///usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar//
///usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar//
///usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar//
//# md5sum
/usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar//
//a8de7306d7c80b5a73e93b83afdd119f
/usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar//
//# md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar//
//a8de7306d7c80b5a73e93b83afdd119f
/usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar//
//# md5sum /usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar//
//a8de7306d7c80b5a73e93b83afdd119f
/usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar/

The classpath.conf was not modified:
/# cat /etc/cloudstack/management/classpath.conf
#!/bin/bash
#...

SYSTEMJARS=""
SCP=$(build-classpath $SYSTEMJARS 2>/dev/null) ; if [ $? != 0 ] ; then
export SCP="" ; fi
MCP=""
DCP="/usr/share/tomcat6/bin/bootstrap.jar:/usr/share/tomcat6/bin/tomcat-juli.jar"
CLASSPATH=$SCP:$DCP:$MCP:/etc/cloudstack/management:/usr/share/cloudstack-management/setup
for jarfile in ""/* ; do
    if [ ! -e "$jarfile" ] ; then continue ; fi
    CLASSPATH=$jarfile:$CLASSPATH
done
for plugin in ""/* ; do
    if [ ! -e "$plugin" ] ; then continue ; fi
    CLASSPATH=$plugin:$CLASSPATH
done
for vendorconf in "/etc/cloudstack/management"/vendor/* ; do
    if [ ! -d "$vendorconf" ] ; then continue ; fi
    CLASSPATH=$vendorconf:$CLASSPATH
done
export CLASSPATH
if ([ -z "$JAVA_HOME" ] || [ ! -d "$JAVA_HOME" ]) && [ -d
/usr/lib/jvm/jre-1.7.0 ]; then
     export JAVA_HOME=/usr/lib/jvm/jre-1.7.0
fi
PATH=$JAVA_HOME/bin:/sbin:/usr/sbin:$PATH
export PATH/

Regards
Martin

On 08/24/2016 06:56 PM, Rohit Yadav wrote:


Martin,


Were you able to fix your issue after installing packages from the
repo Will shared and restarting the services?

I've not personally tested the apt-get.eu repo, but I had earlier
built this repo which I'm personally using in my local KVM-trusty
based cloud: http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/


If you're still getting the error, can you share the JRE version
you're running, both on the mgmt server and on the KVM hosts? You can
run java -version, or share output of "dpkg --get-selections | grep -e
'jdk' -e 'java'". Are you running CloudStack with any additional plugins?

>From the logs, looks like there are mixed jar files,
NioConnectionException class was not found -- something's wrong with
your installation. there must be a cloud-utils jar file make sure your
installation don't have multiple copies/versions of jars
(somewhere) in the in /usr/share/cloudstack-common and in
/usr/share/cloudstack-management/webapps/client/ paths:

Could not find exception:
com.cloud.utils.exception.NioConnectionException in error code list for
exceptions
The error "Unable to initialize the threads." suggests, JVM was not
able to spawn threads. I would like to know your JRE version and any
other settings configured in /etc/cloudstack/management/classpath.conf
(and there are bunch of other files where JAVA_OPTS might have been
overridden). Note: For now you should only be using JRE1.7.


Regards.

rohit.yadav@shapeblue.com<mailto:rohit.yadav@shapeblue.com>
www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com>
@shapeblue




------------------------------------------------------------------------
*From:* martin kolly <martin.kolly@senselan.ch><mailto:martin.kolly@senselan.ch>
*Sent:* 24 August 2016 19:53:26
*To:* dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>; Rohit Yadav
*Subject:* Re: CS 4.9 NIO Selector wait time PR-1601

Thanks Will!

yes the repo is pointing to 4.9 release for all KVMs and for the
management server:
/cloudstack:~# cat /etc/apt/sources.list.d/cloudstack.list //
//deb http://cloudstack.apt-get.eu/ubuntu trusty 4.9/

All KVM agents and the mgmt server are upgraded to release 4.9 based
on the documentation.We have restarted all the cloudstack-agents and
the cloudstack-management service as well.

Network traces are showing packets from KVM <-> Mgmt on port 8250.
there is no security device in between.

thanks
fanfarlo




On 08/24/2016 04:13 PM, Will Stevens wrote:


@rohit, I am guessing they should be installing the cloudstack-agent using
the following repo right?  That is what is described in the upgrade (trusty
instead of precise though).

http://cloudstack.apt-get.eu/ubuntu/dists/trusty/4.9/

@fanfarlo, are your repo's setup to point to the new 4.9 version?

cheers,

will

On Wed, Aug 24, 2016 at 9:46 AM, Rohit Yadav <rohit.yadav@shapeblue.com><mailto:rohit.yadav@shapeblue.com>
wrote:



The PR and fix already exists in 4.9.0 release. Please make sure to
upgrade all of your management server(s) and KVM agents and then also
restart them after the upgrade.


If you are seeing SSL handshake failures, it could be due to network or
security issue and most likely due to mismatch between CloudStack mgmt
server and KVM agent version.


Regards.

rohit.yadav@shapeblue.com<mailto:rohit.yadav@shapeblue.com>
www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com>
@shapeblue



------------------------------
*From:* Will Stevens <williamstevens@gmail.com><mailto:williamstevens@gmail.com>
*Sent:* 24 August 2016 18:17:17
*To:* dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>; Rohit Yadav
*Subject:* Re: CS 4.9 NIO Selector wait time PR-1601


That PR is already merged, so you don't have to do anything to get that
code, you already have it.

@rohit, can you review this?  I think this is a similar to the issue Simon
reported earlier.

Will

On Aug 24, 2016 6:56 AM, "fanfarlo" <fanfarlo2@gmail.com><mailto:fanfarlo2@gmail.com>
wrote:



hi all

We have the following environment:
- OS: Debian 14.04 (hypervisors and management)
- 4 KVM Hosts
- Cloudstack Release 4.9 with local database

Since we upgraded to Release 4.9 the KVM hosts no longer connect to the
management Server. Upgrade procedure was followed as described:
http://docs.cloudstack.apache.org/projects/cloudstack-releas
e-notes/en/4.9.0/upgrade/upgrade-4.8.html


On the KVM hosts we have the following error message:
/2016-08-24 10:42:49,678 INFO  [utils.exception.CSExceptionErrorCode]
(main:null) (logid:) Could not find exception:
com.cloud.utils.exception.NioConnectionException in error code list for
exceptions
2016-08-24 10:42:49,678 WARN  [cloud.agent.Agent] (main:null) (logid:)
NIO Connection Exception
com.cloud.utils.exception.NioConnectionException: SSL Handshake failed
while connecting to host: 10.100.12.10 port: 8250
2016-08-24 10:42:49,678 INFO  [cloud.agent.Agent] (main:null) (logid:)
Attempted to connect to the server, but received an unexpected
exception, trying again...
2016-08-24 10:42:54,679 INFO  [utils.nio.NioClient] (main:null) (logid:)
Connecting to 10.100.12.10:8250
2016-08-24 10:42:54,684 WARN  [utils.nio.Link] (main:null) (logid:) This
SSL engine was forced to close inbound due to end of stream.
2016-08-24 10:42:54,684 ERROR [utils.nio.NioClient] (main:null) (logid:)
SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250
2016-08-24 10:42:54,685 ERROR [utils.nio.NioConnection] (main:null)
(logid:) Unable to initialize the threads.
java.io.IOException: SSL Handshake failed while connecting to host:
10.100.12.10 port: 8250
    at com.cloud.utils.nio.NioClient.init(NioClient.java:67)
    at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)
    at com.cloud.agent.Agent.start(Agent.java:237)
    at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)
    at
com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367)
    at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351)
    at com.cloud.agent.AgentShell.start(AgentShell.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
ssorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
thodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.commons.daemon.support.DaemonLoader.start(DaemonL
oader.java:243)
2016-08-24 10:42:54,685 INFO  [utils.exception.CSExceptionErrorCode]
(main:null) (logid:) Could not find exception:
com.cloud.utils.exception.NioConnectionException in error code list for
exceptions
2016-08-24 10:42:54,685 WARN  [cloud.agent.Agent] (main:null) (logid:)
NIO Connection Exception
com.cloud.utils.exception.NioConnectionException: SSL Handshake failed
while connecting to host: 10.100.12.10 port: 8250
2016-08-24 10:42:54,686 INFO  [cloud.agent.Agent] (main:null) (logid:)
Attempted to connect to the server, but received an unexpected
exception, trying again.../


Port is open on the management server, there is no firewall in between.
We found that there was a bug report here:
https://issues.apache.org/jira/browse/CLOUDSTACK-9348. There is a PR
changing the NIO Selector wait time:
https://github.com/apache/cloudstack/pull/1601 which was merged into
master branch.

Since we installed Release 4.9 we probably need to patch the
NioConection.class as described in PR1601 , right?

kvm03# unzip -v /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar  |
grep NioConnection
    3923  Defl:N     1778  55% 2016-08-02 09:28 05aaf7d5
com/cloud/utils/nio/NioConnection$1.class
     881  Defl:N      495  44% 2016-08-02 09:28 e378984c
com/cloud/utils/nio/NioConnection$ChangeRequest.class
   15410  Defl:N     7130  54% 2016-08-02 09:28 b3281f5a
com/cloud/utils/nio/NioConnection.class
    1134  Defl:N      584  49% 2016-08-02 09:28 8d5cb4a8
com/cloud/utils/exception/NioConnectionException.class

Due to a lack of java expertise we have some basic questions:
- Is there a patched jar file available ? public build server?
- Do we need to create the jar from sources ? procedure?
- How do we apply the patch ?

many thanks!
fanfarlo








rohit.yadav@shapeblue.com<mailto:rohit.yadav@shapeblue.com>
www.shapeblue.com<http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue







Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message