brooklyn-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aleds...@apache.org
Subject [1/2] brooklyn-docs git commit: Troubleshooting tips for slow Brooklyn
Date Mon, 06 Jun 2016 22:57:47 GMT
Repository: brooklyn-docs
Updated Branches:
  refs/heads/master 7e8166fa1 -> 30aff82ff


Troubleshooting tips for slow Brooklyn


Project: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/commit/669b2e94
Tree: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/tree/669b2e94
Diff: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/diff/669b2e94

Branch: refs/heads/master
Commit: 669b2e94ee46446de7b1f0947e79e97c7f23d78a
Parents: 74a25d1
Author: Aled Sage <aled.sage@gmail.com>
Authored: Tue May 31 01:04:15 2016 +0100
Committer: Aled Sage <aled.sage@gmail.com>
Committed: Mon Jun 6 23:56:14 2016 +0100

----------------------------------------------------------------------
 .../troubleshooting/detailed-support-report.md  |  43 ++++
 guide/ops/troubleshooting/index.md              |   2 +
 guide/ops/troubleshooting/slow-unresponsive.md  | 237 +++++++++++++++++++
 website/documentation/faq.md                    |   2 +-
 4 files changed, 283 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/669b2e94/guide/ops/troubleshooting/detailed-support-report.md
----------------------------------------------------------------------
diff --git a/guide/ops/troubleshooting/detailed-support-report.md b/guide/ops/troubleshooting/detailed-support-report.md
new file mode 100644
index 0000000..6e3c741
--- /dev/null
+++ b/guide/ops/troubleshooting/detailed-support-report.md
@@ -0,0 +1,43 @@
+---
+layout: website-normal
+title: Detailed Support Report
+toc: /guide/toc.json
+---
+
+If you wish to send a detailed report, then depending on the nature of the problem, consider

+collecting the following information.
+
+See [Brooklyn Slow or Unresponse](slow-unresponsive.html) docs for details of these commands.
+ 
+{% highlight bash %}
+BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin
+BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java)
+REPORT_DIR=/tmp/brooklyn-report/
+DEBUG_LOG=${BROOKLYN_HOME}/brooklyn.debug.log
+
+uname -a > ${REPORT_DIR}/uname.txt
+df -h > ${REPORT_DIR}/df.txt
+cat /proc/cpuinfo > ${REPORT_DIR}/cpuinfo.txt
+cat /proc/meminfo > ${REPORT_DIR}/meminfo.txt
+ulimit -a > ${REPORT_DIR}/ulimit.txt
+cat /proc/${BROOKLYN_PID}/limits >> ${REPORT_DIR}/ulimit.txt
+top -n 1 -b > ${REPORT_DIR}/top.txt
+lsof -p ${BROOKLYN_PID} > ${REPORT_DIR}/lsof.txt
+netstat -an > ${REPORT_DIR}/netstat.txt
+
+jmap -histo:live ${BROOKLYN_PID} > ${REPORT_DIR}/jmap-histo.txt
+jmap -heap ${BROOKLYN_PID} > ${REPORT_DIR}/jmap-heap.txt
+for i in {1..10}; do
+  jstack ${BROOKLYN_PID} > ${REPORT_DIR}/jstack.${i}.txt
+  sleep 1
+done
+grep "brooklyn gc" ${DEBUG_LOG} > ${REPORT_DIR}/brooklyn-gc.txt
+grep "events for subscriber" ${DEBUG_LOG} > ${REPORT_DIR}/events-for-subscriber.txt
+tar czf brooklyn-report.tgz ${REPORT_DIR}
+{% endhighlight %}
+
+Also consider providing your log files and persisted state, though extreme care should be
taken if
+these might contain cloud or machine credentials (especially if 
+[Externalised Configuration](({{ site.path.guide }}/ops/externalized-configuration.html)

+is not being used for credential storage).
+

http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/669b2e94/guide/ops/troubleshooting/index.md
----------------------------------------------------------------------
diff --git a/guide/ops/troubleshooting/index.md b/guide/ops/troubleshooting/index.md
index ee8dfd7..ebbce45 100644
--- a/guide/ops/troubleshooting/index.md
+++ b/guide/ops/troubleshooting/index.md
@@ -5,6 +5,8 @@ children:
 - { path: overview.md, title: Overview }
 - { path: deployment.md, title: Deployment }
 - { path: connectivity.md, title: Server Connectivity }
+- { path: unresponsive.md, title: Brooklyn Slow or Unresponsive }
+- { path: detailed-support-report.md, title:  Detailed Support Report }
 - { path: softwareprocess.md, title: SoftwareProcess Entities }
 - { path: going-deep-in-java-and-logs.md, title: Going Deep in Java and Logs }
 ---

http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/669b2e94/guide/ops/troubleshooting/slow-unresponsive.md
----------------------------------------------------------------------
diff --git a/guide/ops/troubleshooting/slow-unresponsive.md b/guide/ops/troubleshooting/slow-unresponsive.md
new file mode 100644
index 0000000..0b90e83
--- /dev/null
+++ b/guide/ops/troubleshooting/slow-unresponsive.md
@@ -0,0 +1,237 @@
+---
+layout: website-normal
+title: Brooklyn Slow or Unresponsive
+toc: /guide/toc.json
+---
+
+There are many possible causes for a Brooklyn server becoming slow or unresponsive. This
guide 
+describes some possible reasons, and some commands and tools that can help diagnose the problem.
+
+Possible reasons include:
+* CPU is max'ed out
+* Memory usage is extremely high
+* SSH'ing is very slow due (e.g. due to lack of entropy)
+* Out of disk space
+
+See [Brooklyn Requirements]({{ site.path.guide }}/ops/requirements.html) for details of server

+requirements.
+
+
+## Machine Diagnostics
+
+The following commands will collect OS-level diagnostics about the machine, and about the
AMP 
+process. The commands below assume use of CentOS 6.x. Minor adjustments may be required for
+other platforms.
+
+
+#### OS and Machine Details
+
+To display system information, run:
+
+{% highlight bash %}
+uname -a
+{% endhighlight %}
+
+To show details of the CPU and memory available to the machine, run:
+
+{% highlight bash %}
+cat /proc/cpuinfo
+cat /proc/meminfo
+{% endhighlight %}
+
+
+#### User Limits
+
+To display information about user limits, run the command below (while logged in as the same
user
+who runs Brooklyn):
+
+{% highlight bash %}
+ulimit -a
+{% endhighlight %}
+
+If Brooklyn is run as a different user (e.g. with user name "adalovelace"), then instead
run:
+
+{% highlight bash %}
+ulimit -a -u adalovelace
+{% endhighlight %}
+
+Of particular interest is the limit for "open files".
+
+
+#### Disk Space
+
+The command below will list the disk size for each partition, including the amount used and

+available. If the AMP base directory, persistence directory or logging directory are close

+to 0% available, this can cause serious problems:
+
+{% highlight bash %}
+df -h
+{% endhighlight %}
+
+
+#### CPU and Memory Usage
+
+To view the CPU and memory usage of all processes, and of the machine as a whole, one can
use the 
+`top` command. This runs interactively, updating every few seconds. To collect the output
once 
+(e.g. to share diagnostic information in a bug report), run:
+ 
+{% highlight bash %}
+top -n 1 -b > top.txt
+{% endhighlight %}
+
+
+#### File and Network Usage
+
+To count the number of open files for the Brooklyn process (which includes open socket connections):
+
+{% highlight bash %}
+BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin
+BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java)
+lsof -p $BROOKLYN_PID | wc -l
+{% endhighlight %}
+
+To count (or view the number of "established" internet connections, run:
+
+{% highlight bash %}
+netstat -an | grep ESTABLISHED | wc -l
+{% endhighlight %}
+
+
+#### Linux Kernel Entropy
+
+A lack of entropy can cause random number generation to be extremely slow. This can cause
+tasks like ssh to also be extremely slow. See 
+[linux kernel entropy]({{ site.path.website }}/documentation/increase-entropy.html)
+for details of how to work around this.
+
+
+## Process Diagnostics
+
+#### Thread and Memory Usage
+
+To get memory and thread usage for the Brooklyn (Java) process, two useful tools are `jstack`

+and `jmap`. These require the "development kit" to also be installed 
+(e.g. `yum install java-1.7.0-openjdk-devel`). Some useful commands are shown below:
+
+{% highlight bash %}
+BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin
+BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java)
+
+jstack $BROOKLYN_PID
+jmap -histo:live $BROOKLYN_PID
+jmap -heap $BROOKLYN_PID
+{% endhighlight %}
+ 
+
+#### Runnable Threads
+
+The [jstack-active](https://github.com/apache/brooklyn-dist/blob/master/scripts/jstack-active.sh)
+script is a convenient light-weight way to quickly see which threads of a running Brooklyn
+server are attempting to consume the CPU. It filters the output of `jstack`, to show only
the
+"really-runnable" threads (as opposed to those that are blocked).
+
+{% highlight bash %}
+BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin
+BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java)
+
+curl -O https://raw.githubusercontent.com/apache/brooklyn-dist/master/scripts/jstack-active.sh
+
+jstack-active $BROOKLYN_PID
+{% endhighlight %}
+
+
+#### Profiling
+
+If an in-depth investigation of the CPU usage (and/or object creation) of a Brooklyn Server
is
+requiring, there are many profiling tools designed specifically for this purpose. These generally
+require that the process be launched in such a way that a profiler can attach, which may
not be
+appropriate for a production server.
+
+
+#### Remote Debugging
+
+If the Brooklyn Server was originally run to allow a remote debugger to connect (strongly

+discouraged in production!), then this provides a convenient way to investigate why Brooklyn
+is being slow or unresonsive. See the Debugging Tips in the 
+tip [Debugging Remote Brooklyn][({{ site.path.guide }}/dev/tips/debugging-remote-brooklyn.html)
+and the the [IDE docs](See [Brooklyn Requirements]({{ site.path.guide }}/dev/env/ide/) for
more
+information.
+
+
+## Log Files
+
+Apache Brooklyn will by default create brooklyn.info.log and brooklyn.debug.log files. See
the
+[Logging](({{ site.path.guide }}/ops/logging.html) docs for more information.
+
+The following are useful log messages to search for (e.g. using `grep`). Note the wording
of
+these messages (or their very presence) may change in future version of Brooklyn. 
+
+
+#### Normal Logging
+
+The lines below are commonly logged, and can be useful to search for when finding the start
of a section of logging.
+
+{% highlight %}
+2016-05-30 17:05:51,458 INFO  o.a.b.l.BrooklynWebServer [main]: Started Brooklyn console
at http://127.0.0.1:8081/, running classpath://brooklyn.war
+2016-05-30 17:06:04,098 INFO  o.a.b.c.m.h.HighAvailabilityManagerImpl [main]: Management
node tF3GPvQ5 running as HA MASTER autodetected
+2016-05-30 17:06:08,982 INFO  o.a.b.c.m.r.InitialFullRebindIteration [brooklyn-execmanager-rvpnFTeL-0]:
Rebinding from /home/compose/compose-amp-state/brooklyn-persisted-state/data for master rvpnFTeL...
+2016-05-30 17:06:11,105 INFO  o.a.b.c.m.r.RebindIteration [brooklyn-execmanager-rvpnFTeL-0]:
Rebind complete (MASTER) in 2s: 19 apps, 54 entities, 50 locations, 46 policies, 704 enrichers,
0 feeds, 160 catalog items
+{% endhighlight %}
+
+
+#### Memory Usage
+
+The debug log includes (every minute) a log statement about the memory usage and task activity.
For example:
+
+{% highlight %}
+2016-05-27 12:20:19,395 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn
gc (before) - using 328 MB / 496 MB memory (5.58 kB soft); 69 threads; storage: {datagrid={size=7,
createCount=7}, refsMapSize=0, listsMapSize=0}; tasks: 10 active, 33 unfinished; 78 remembered,
1696906 total submitted)
+2016-05-27 12:20:19,395 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn
gc (after) - using 328 MB / 496 MB memory (5.58 kB soft); 69 threads; storage: {datagrid={size=7,
createCount=7}, refsMapSize=0, listsMapSize=0}; tasks: 10 active, 33 unfinished; 78 remembered,
1696906 total submitted)
+{% endhighlight %}
+
+These can be extremely useful if investigating a memory or thread leak, or to determine whether
a 
+surprisingly high number of tasks are being executed.
+
+
+#### Subscriptions
+
+One source of high CPU in Brooklyn is when a subscription (e.g. for a policy or enricher)
is being 
+triggered many times (i.e. handling many events). A log message like that below will be logged
on 
+every 1000 events handled by a given single subscription.
+
+{% highlight %}
+2016-05-30 17:29:09,125 DEBUG o.a.b.c.m.i.LocalSubscriptionManager [brooklyn-execmanager-rvpnFTeL-8]:
1000 events for subscriber Subscription[SCFnav9g;CanopyComposeApp{id=gIeTwhU2}@gIeTwhU2:webapp.url]
+{% endhighlight %}
+
+If a subscription is handling a huge number of events, there are a couple of common reasons:
+* first, it could be subscribing to too much activity - e.g. a wildcard subscription for
all 
+  events from all entities.
+* second it could be an infinite loop (e.g. where an enricher responds to a sensor-changed
event
+  by setting that same sensor, thus triggering another sensor-changed event).
+
+
+#### User Activity
+
+All activity triggered by the REST API or web-console will be logged. Some examples are shown
below:
+
+{% highlight %}
+2016-05-19 17:52:30,150 INFO  o.a.b.r.r.ApplicationResource [brooklyn-jetty-server-8081-qtp1058726153-17473]:
Launched from YAML: name: My Example App
+location: aws-ec2:us-east-1
+services:
+- type: org.apache.brooklyn.entity.webapp.tomcat.TomcatServer
+
+2016-05-30 14:46:19,516 DEBUG brooklyn.REST [brooklyn-jetty-server-8081-qtp1104967201-20881]:
Request Tisj14 starting: POST /v1/applications/NiBy0v8Q/entities/NiBy0v8Q/expunge from 77.70.102.66
+{% endhighlight %}
+
+
+#### Entity Activity
+
+If investigating the behaviour of a particular entity (e.g. on failure), it can be very useful
to 
+`grep` the info and debug log for the entity's id. For a software process, the debug log
will 
+include the stdout and stderr of all the commands executed by that entity.
+
+It can also be very useful to search for all effector invocations, to see where the behaviour
+has been triggered:
+
+{% highlight %}
+2016-05-27 12:45:43,529 DEBUG o.a.b.c.m.i.EffectorUtils [brooklyn-execmanager-gvP7MuZF-14364]:
Invoking effector stop on TomcatServerImpl{id=mPujYmPd}
+{% endhighlight %}

http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/669b2e94/website/documentation/faq.md
----------------------------------------------------------------------
diff --git a/website/documentation/faq.md b/website/documentation/faq.md
index 7af5f80..483d686 100644
--- a/website/documentation/faq.md
+++ b/website/documentation/faq.md
@@ -31,7 +31,7 @@ You could encounter this error when running with many entities.
 Please **increase the ulimit** if you see such error:
 
 On the VM running Apache Brooklyn, we recommend ensuring nproc and nofile are reasonably
high (e.g. higher than 1024, which is often the default).
-We recommend setting it limits to a value above 16000.
+We recommend setting it limits to a value of 16384 or higher.
 
 If you want to check the current limits run `ulimit -a`.
 


Mime
View raw message