From commits-return-18055-archive-asf-public=cust-asf.ponee.io@brooklyn.apache.org Fri Feb 16 11:26:49 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 27A7D1807D2 for ; Fri, 16 Feb 2018 11:26:43 +0100 (CET) Received: (qmail 27249 invoked by uid 500); 16 Feb 2018 10:26:43 -0000 Mailing-List: contact commits-help@brooklyn.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@brooklyn.apache.org Delivered-To: mailing list commits@brooklyn.apache.org Received: (qmail 26936 invoked by uid 99); 16 Feb 2018 10:26:43 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Feb 2018 10:26:43 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 4D3DFF32C7; Fri, 16 Feb 2018 10:26:40 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: tbouron@apache.org To: commits@brooklyn.apache.org Date: Fri, 16 Feb 2018 10:26:42 -0000 Message-Id: In-Reply-To: <14c21b8b853d43b29da89a8d87f39d76@git.apache.org> References: <14c21b8b853d43b29da89a8d87f39d76@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [03/25] brooklyn-docs git commit: Delete all the guide files http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/going-deep-in-java-and-logs.md ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/going-deep-in-java-and-logs.md b/guide/ops/troubleshooting/going-deep-in-java-and-logs.md deleted file mode 100644 index 8a66ce0..0000000 --- a/guide/ops/troubleshooting/going-deep-in-java-and-logs.md +++ /dev/null @@ -1,484 +0,0 @@ ---- -layout: website-normal -title: "Troubleshooting: Going Deep in Java and Logs" -toc: /guide/toc.json ---- - -This guide takes a deep look at the Java and log messages for some failure scenarios, -giving common steps used to identify the issues. - -## Script Failure - -Many blueprints run bash scripts as part of the installation. This section highlights how to identify a problem with -a bash script. - -First let's take a look at the `customize()` method of the Tomcat server blueprint: - -{% highlight java %} -@Override -public void customize() { - newScript(CUSTOMIZING) - .body.append("mkdir -p conf logs webapps temp") - .failOnNonZeroResultCode() - .execute(); - - copyTemplate(entity.getConfig(TomcatServer.SERVER_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "server.xml")); - copyTemplate(entity.getConfig(TomcatServer.WEB_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "web.xml")); - - if (isProtocolEnabled("HTTPS")) { - String keystoreUrl = Preconditions.checkNotNull(getSslKeystoreUrl(), "keystore URL must be specified if using HTTPS for " + entity); - String destinationSslKeystoreFile = getHttpsSslKeystoreFile(); - InputStream keystoreStream = resource.getResourceFromUrl(keystoreUrl); - getMachine().copyTo(keystoreStream, destinationSslKeystoreFile); - } - - getEntity().deployInitialWars(); -} -{% endhighlight %} - -Here we can see that it's running a script to create four directories before continuing with the customization. Let's -introduce an error by changing `mkdir` to `mkrid`: - -{% highlight java %} -newScript(CUSTOMIZING) - .body.append("mkrid -p conf logs webapps temp") // `mkdir` changed to `mkrid` - .failOnNonZeroResultCode() - .execute(); -{% endhighlight %} - -Now let's try deploying this using the following YAML: - -{% highlight yaml %} - -name: Tomcat failure test -location: localhost -services: -- type: org.apache.brooklyn.entity.webapp.tomcat.TomcatServer - -{% endhighlight %} - -Shortly after deployment, the entity fails with the following error: - -`Failure running task ssh: customizing TomcatServerImpl{id=e1HP2s8x} (HmyPAozV): -Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}` - -[![Script failure error in the Brooklyn debug console.](images/script-failure.png)](images/script-failure-large.png) - -By selecting the `Activities` tab, we can drill into the task that failed. The list of tasks shown (where the -effectors are shown as top-level tasks) are clickable links. Selecting that row will show the details of -that particular task, including its sub-tasks. We can eventually get to the specific sub-task that failed: - -[![Task failure error in the Brooklyn debug console.](images/failed-task.png)](images/failed-task-large.png) - -By clicking on the `stderr` link, we can see the script failed with the following error: - -{% highlight console %} -/tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh: line 10: mkrid: command not found -{% endhighlight %} - -This tells us *what* went wrong, but doesn't tell us *where*. In order to find that, we'll need to look at the -stack trace that was logged when the exception was thrown. - -It's always worth looking at the Detailed Status section as sometimes this will give you the information you need. -In this case, the stack trace is limited to the thread that was used to execute the task that ran the script: - -{% highlight console %} -Failed after 40ms - -STDERR -/tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh: line 10: mkrid: command not found - - -STDOUT -Executed /tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh, result 127: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x} - -java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x} - at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.logWithDetailsAndThrow(ScriptHelper.java:390) - at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:379) - at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:289) - at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287) - at org.apache.brooklyn.core.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343) - at org.apache.brooklyn.core.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469) - at java.util.concurrent.FutureTask.run(FutureTask.java:262) - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) - at java.lang.Thread.run(Thread.java:745) -{% endhighlight %} - -In order to find the exception, we'll need to look in Brooklyn's debug log file. By default, the debug log file -is named `brooklyn.debug.log`. Usually the easiest way to navigate the log file is to use `less`, e.g. -`less brooklyn.debug.log`. We can quickly find find the stack trace by first navigating to the end of the log file -with `Shift-G`, then performing a reverse-lookup by typing `?Tomcat` and pressing `Enter`. If searching for the -blueprint type (in this case Tomcat) simply matches tasks unrelated to the exception, you can also search for -the text of the error message, in this case `? invalid result 127`. You can make the search case-insensitivity by -typing `-i` before performing the search. To skip the current match and move to the next one (i.e. 'up' as we're -performing a reverse-lookup), simply press `n` - -In this case, the `?Tomcat` search takes us directly to the full stack trace (Only the last part of the trace -is shown here): - -{% highlight console %} -... at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63) ~[guava-17.0.jar:na] - at org.apache.brooklyn.core.util.task.BasicTask.get(BasicTask.java:343) ~[classes/:na] - at org.apache.brooklyn.core.util.task.BasicTask.getUnchecked(BasicTask.java:352) ~[classes/:na] - ... 9 common frames omitted -Caused by: brooklyn.util.exceptions.PropagatedRuntimeException: - at org.apache.brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:97) ~[classes/:na] - at org.apache.brooklyn.core.util.task.BasicTask.getUnchecked(BasicTask.java:354) ~[classes/:na] - at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.execute(ScriptHelper.java:339) ~[classes/:na] - at org.apache.brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:72) ~[classes/:na] - at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver$8.run(AbstractSoftwareProcessDriver.java:150) ~[classes/:na] - at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_71] - at org.apache.brooklyn.core.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343) ~[classes/:na] - ... 5 common frames omitted -Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x} - at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_71] - at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_71] - at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63) ~[guava-17.0.jar:na] - at org.apache.brooklyn.core.util.task.BasicTask.get(BasicTask.java:343) ~[classes/:na] - at org.apache.brooklyn.core.util.task.BasicTask.getUnchecked(BasicTask.java:352) ~[classes/:na] - ... 10 common frames omitted -Caused by: java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x} - at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.logWithDetailsAndThrow(ScriptHelper.java:390) ~[classes/:na] - at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:379) ~[classes/:na] - at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:289) ~[classes/:na] - at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287) ~[classes/:na] - ... 6 common frames omitted -{% endhighlight %} - -Brooklyn's use of tasks and helper classes can make the stack trace a little harder than usual to follow, but a good -place to start is to look through the stack trace for the node's implementation or ssh driver classes (usually -named `FooNodeImpl` or `FooSshDriver`). In this case we can see the following: - -{% highlight console %} -at org.apache.brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:72) ~[classes/:na] -{% endhighlight %} - -Combining this with the error message of `mkrid: command not found` we can see that indeed `mkdir` has been -misspelled `mkrid` on line 72 of `TomcatSshDriver.java`. - - -## Non-Script Failure - -The section above gives an example of a failure that occurs when a script is run. In this section we will look at -a failure in a non-script related part of the code. We'll use the `customize()` method of the Tomcat server again, -but this time, we'll correct the spelling of 'mkdir' and add a line that attempts to copy a nonexistent resource -to the remote server: - -{% highlight java %} - -newScript(CUSTOMIZING) - .body.append("mkdir -p conf logs webapps temp") - .failOnNonZeroResultCode() - .execute(); - -copyTemplate(entity.getConfig(TomcatServer.SERVER_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "server.xml")); -copyTemplate(entity.getConfig(TomcatServer.WEB_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "web.xml")); -copyTemplate("classpath://nonexistent.xml", Os.mergePaths(getRunDir(), "conf", "nonexistent.xml")); // Resource does not exist! - -{% endhighlight %} - -Let's deploy this using the same YAML from above. Here's the resulting error in the Brooklyn debug console: - -[![Resource exception in the Brooklyn debug console.](images/resource-exception.png)](images/resource-exception-large.png) - -Again, this tells us *what* the error is, but we need to find *where* the code is that attempts to copy this file. In -this case it's shown in the Detailed Status section, and we don't need to go to the log file: - -{% highlight console %} - -Failed after 221ms: Error getting resource 'classpath://nonexistent.xml' for TomcatServerImpl{id=PVZxDKU1}: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath - -java.lang.RuntimeException: Error getting resource 'classpath://nonexistent.xml' for TomcatServerImpl{id=PVZxDKU1}: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath - at org.apache.brooklyn.core.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:297) - at org.apache.brooklyn.core.util.ResourceUtils.getResourceAsString(ResourceUtils.java:475) - at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver.getResourceAsString(AbstractSoftwareProcessDriver.java:447) - at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver.processTemplate(AbstractSoftwareProcessDriver.java:469) - at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver.copyTemplate(AbstractSoftwareProcessDriver.java:390) - at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver.copyTemplate(AbstractSoftwareProcessDriver.java:379) - at org.apache.brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:79) - at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver$8.run(AbstractSoftwareProcessDriver.java:150) - at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) - at org.apache.brooklyn.core.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343) - at org.apache.brooklyn.core.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469) - at java.util.concurrent.FutureTask.run(FutureTask.java:262) - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) - at java.lang.Thread.run(Thread.java:745) -Caused by: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath - at org.apache.brooklyn.core.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:233) - ... 14 more -Caused by: java.io.IOException: nonexistent.xml not found on classpath - at org.apache.brooklyn.core.util.ResourceUtils.getResourceViaClasspath(ResourceUtils.java:372) - at org.apache.brooklyn.core.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:230) - ... 14 more - -{% endhighlight %} - -Looking for `Tomcat` in the stack trace, we can see in this case the problem lies at line 79 of `TomcatSshDriver.java` - - -## External Failure - -Sometimes an entity will fail outside the direct commands issues by Brooklyn. When installing and launching an entity, -Brooklyn will check the return code of scripts that were run to ensure that they completed successfully (i.e. the -return code of the script is zero). It is possible, for example, that a launch script completes successfully, but -the entity fails to start. - -We can simulate this type of failure by launching Tomcat with an invalid configuration file. As seen in the previous -examples, Brooklyn copies two xml configuration files to the server: `server.xml` and `web.xml` - -The first few non-comment lines of `server.xml` are as follows (you can see the full file [here]({{ site.brooklyn.url.git }}/software/webapp/src/main/resources/org/apache/brooklyn/entity/webapp/tomcat/server.xml)): - -{% highlight xml %} - - - - - -{% endhighlight%} - -Let's add an unmatched XML element, which will make this XML file invalid: - -{% highlight xml %} - - - - - - -{% endhighlight%} - -As Brooklyn doesn't know how these types of resources are used, they're not validated as they're copied to the remote machine. -As far as Brooklyn is concerned, the file will have copied successfully. - -Let's deploy Tomcat again, using the same YAML as before. This time, the deployment runs for a few minutes before failing -with `Timeout waiting for SERVICE_UP`: - -[![External error in the Brooklyn debug console.](images/external-error.png)](images/external-error-large.png) - -If we drill down into the tasks in the `Activities` tab, we can see that all of the installation and launch tasks -completed successfully, and stdout of the `launch` script is as follows: - -{% highlight console %} - -Executed /tmp/brooklyn-20150721-153049139-fK2U-launching_TomcatServerImpl_id_.sh, result 0 - -{% endhighlight %} - -The task that failed was the `post-start` task, and the stack trace from the Detailed Status section is as follows: - -{% highlight console %} - -Failed after 5m 1s: Timeout waiting for SERVICE_UP from TomcatServerImpl{id=BUHgQeOs} - -java.lang.IllegalStateException: Timeout waiting for SERVICE_UP from TomcatServerImpl{id=BUHgQeOs} - at org.apache.brooklyn.core.entity.Entities.waitForServiceUp(Entities.java:1073) - at org.apache.brooklyn.entity.software.base.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:388) - at org.apache.brooklyn.entity.software.base.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:385) - at org.apache.brooklyn.entity.software.base.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:164) - at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:433) - at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) - at org.apache.brooklyn.core.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343) - at org.apache.brooklyn.core.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469) - at java.util.concurrent.FutureTask.run(FutureTask.java:262) - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) -at java.lang.Thread.run(Thread.java:745) - -{% endhighlight %} - -This doesn't really tell us what we need to know, and looking in the `brooklyn.debug.log` file yields no further -clues. The key here is the error message `Timeout waiting for SERVICE_UP`. After running the installation and -launch scripts, assuming all scripts completed successfully, Brooklyn will periodically check the health of the node -and will set the node on fire if the health check does not pass within a pre-prescribed period (the default is -two minutes, and can be configured using the `start.timeout` config key). The periodic health check also continues -after the successful launch in order to check continued operation of the node, but in this case it fails to pass -at all. - -The first thing we need to do is to find out how Brooklyn determines the health of the node. The health-check is -often implemented in the `isRunning()` method in the entity's ssh driver. Tomcat's implementation of `isRunning()` -is as follows: - -{% highlight java %} -@Override -public boolean isRunning() { - return newScript(MutableMap.of(USE_PID_FILE, "pid.txt"), CHECK_RUNNING).execute() == 0; -} -{% endhighlight %} - -The `newScript` method has conveniences for default scripts to check if a process is running based on its PID. In this -case, it will look for Tomcat's PID in the `pid.txt` file and check if the PID is the PID of a running process - -It's worth a quick sanity check at this point to check if the PID file exists, and if the process is running. -By default, the pid file is located in the run directory of the entity. You can find the location of the entity's run -directory by looking at the `run.dir` sensor. In this case it is `/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs`. -To find the pid, you simply cat the pid.txt file in this directory: - -{% highlight console %} -$ cat /tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs/pid.txt -73714 -{% endhighlight %} - -In this case, the PID in the file is 73714. You can then check if the process is running using `ps`. You can also -pipe the output to `fold` so the full launch command is visible: - -{% highlight console %} - -$ ps -p 73714 | fold -w 120 -PID TTY TIME CMD -73714 ?? 0:08.03 /Library/Java/JavaVirtualMachines/jdk1.8.0_51.jdk/Contents/Home/bin/java -Dnop -Djava.util.logg -ing.manager=org.apache.juli.ClassLoaderLogManager -javaagent:/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BU -HgQeOs/brooklyn-jmxmp-agent-shaded-0.8.0-SNAPSHOT.jar -Xms200m -Xmx800m -XX:MaxPermSize=400m -Dcom.sun.management.jmxrem -ote -Dbrooklyn.jmxmp.rmi-port=1099 -Dbrooklyn.jmxmp.port=31001 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.manage -ment.jmxremote.authenticate=false -Djava.endorsed.dirs=/tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7 -.0.56/endorsed -classpath /tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7.0.56/bin/bootstrap.jar:/tmp/ -brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7.0.56/bin/tomcat-juli.jar -Dcatalina.base=/tmp/brooklyn-mart -in/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs -Dcatalina.home=/tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache --tomcat-7.0.56 -Djava.io.tmpdir=/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs/temp org.apache.catali -na.startup.Bootstrap start - -{% endhighlight %} - -This confirms that the process is running. The next thing we can look at is the `service.notUp.indicators` sensor. This -reads as follows: - -{% highlight json %} - -{"service.process.isRunning":"The software process for this entity does not appear to be running"} - -{% endhighlight %} - -This confirms that the problem is indeed due to the `service.process.isRunning` sensor. We assumed earlier that this was -set by the `isRunning()` method in `TomcatSshDriver.java`, but this isn't always the case. The `service.process.isRunning` -sensor is wired up by the `connectSensors()` method in the node's implementation class, in this case -`TomcatServerImpl.java`. Tomcat's implementation of `connectSensors()` is as follows: - -{% highlight java %} - -@Override -public void connectSensors() { - super.connectSensors(); - - if (getDriver().isJmxEnabled()) { - String requestProcessorMbeanName = "Catalina:type=GlobalRequestProcessor,name=\"http-*\""; - - Integer port = isHttpsEnabled() ? getAttribute(HTTPS_PORT) : getAttribute(HTTP_PORT); - String connectorMbeanName = format("Catalina:type=Connector,port=%s", port); - - jmxWebFeed = JmxFeed.builder() - .entity(this) - .period(3000, TimeUnit.MILLISECONDS) - .pollAttribute(new JmxAttributePollConfig(ERROR_COUNT) - .objectName(requestProcessorMbeanName) - .attributeName("errorCount")) - .pollAttribute(new JmxAttributePollConfig(REQUEST_COUNT) - .objectName(requestProcessorMbeanName) - .attributeName("requestCount")) - .pollAttribute(new JmxAttributePollConfig(TOTAL_PROCESSING_TIME) - .objectName(requestProcessorMbeanName) - .attributeName("processingTime")) - .pollAttribute(new JmxAttributePollConfig(CONNECTOR_STATUS) - .objectName(connectorMbeanName) - .attributeName("stateName")) - .pollAttribute(new JmxAttributePollConfig(SERVICE_PROCESS_IS_RUNNING) - .objectName(connectorMbeanName) - .attributeName("stateName") - .onSuccess(Functions.forPredicate(Predicates.equalTo("STARTED"))) - .setOnFailureOrException(false)) - .build(); - - jmxAppFeed = JavaAppUtils.connectMXBeanSensors(this); - } else { - // if not using JMX - LOG.warn("Tomcat running without JMX monitoring; limited visibility of service available"); - connectServiceUpIsRunning(); - } -} - -{% endhighlight %} - -We can see here that if jmx is not enabled, the method will call `connectServiceUpIsRunning()` which will use the -default PID-based method of determining if a process is running. However, as JMX *is* running, the `service.process.isRunning` -sensor (denoted here by the `SERVICE_PROCESS_IS_RUNNING` variable) is set to true if and only if the -`stateName` JMX attribute equals `STARTED`. We can see from the previous call to `.pollAttribute` that this -attribute is also published to the `CONNECTOR_STATUS` sensor. The `CONNECTOR_STATUS` sensor is defined as follows: - -{% highlight java %} - -AttributeSensor CONNECTOR_STATUS = - new BasicAttributeSensor(String.class, "webapp.tomcat.connectorStatus", "Catalina connector state name"); - -{% endhighlight %} - -Let's go back to the Brooklyn debug console and look for the `webapp.tomcat.connectorStatus`: - -[![Sensors view in the Brooklyn debug console.](images/jmx-sensors.png)](images/jmx-sensors-large.png) - -As the sensor is not shown, it's likely that it's simply null or not set. We can check this by clicking -the "Show/hide empty records" icon (highlighted in yellow above): - -[![All sensors view in the Brooklyn debug console.](images/jmx-sensors-all.png)](images/jmx-sensors-all-large.png) - -We know from previous steps that the installation and launch scripts completed, and we know the procecess is running, -but we can see here that the server is not responding to JMX requests. A good thing to check here would be that the -JMX port is not being blocked by iptables, firewalls or security groups -(see the [troubleshooting connectivity guide](connectivity.html)). -Let's assume that we've checked that and they're all open. There is still one more thing that Brooklyn can tell us. - - -Still on the `Sensors` tab, let's take a look at the `log.location` sensor: - -{% highlight console %} - -/tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/logs/catalina.out - -{% endhighlight %} - -This is the location of Tomcat's own log file. The location of the log file will differ from process to process -and when writing a custom entity you will need to check the software's own documentation. If your blueprint's -ssh driver extends `JavaSoftwareProcessSshDriver`, the value returned by the `getLogFileLocation()` method will -automatically be published to the `log.location` sensor. Otherwise, you can publish the value yourself by calling -`entity.setAttribute(Attributes.LOG_FILE_LOCATION, getLogFileLocation());` in your ssh driver - -**Note:** The log file will be on the server to which you have deployed Tomcat, and not on the Brooklyn server. -Let's take a look in the log file: - -{% highlight console %} - -$ less /tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/logs/catalina.out - -Jul 21, 2015 4:12:12 PM org.apache.tomcat.util.digester.Digester fatalError -SEVERE: Parse Fatal Error at line 143 column 3: The element type "unmatched-element" must be terminated by the matching end-tag "". - org.xml.sax.SAXParseException; systemId: file:/tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/conf/server.xml; lineNumber: 143; columnNumber: 3; The element type "unmatched-element" must be terminated by the matching end-tag "". - at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) - at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) - at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) - at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) - at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1437) - at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1749) - at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2973) - at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) - at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) - at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) - at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) - at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) - at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) - at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) - at org.apache.tomcat.util.digester.Digester.parse(Digester.java:1561) - at org.apache.catalina.startup.Catalina.load(Catalina.java:615) - at org.apache.catalina.startup.Catalina.start(Catalina.java:677) - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) - at java.lang.reflect.Method.invoke(Method.java:497) - at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:321) - at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:455) -Jul 21, 2015 4:12:12 PM org.apache.catalina.startup.Catalina load -WARNING: Catalina.start using conf/server.xml: The element type "unmatched-element" must be terminated by the matching end-tag "". -Jul 21, 2015 4:12:12 PM org.apache.catalina.startup.Catalina start -SEVERE: Cannot start server. Server instance is not configured. - -{% endhighlight %} - -As expected, we can see here that the `unmatched-element` element has not been terminated in the `server.xml` file http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/external-error-large.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/external-error-large.png b/guide/ops/troubleshooting/images/external-error-large.png deleted file mode 100644 index abea32d..0000000 Binary files a/guide/ops/troubleshooting/images/external-error-large.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/external-error.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/external-error.png b/guide/ops/troubleshooting/images/external-error.png deleted file mode 100644 index 0b5deff..0000000 Binary files a/guide/ops/troubleshooting/images/external-error.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/failed-task-large.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/failed-task-large.png b/guide/ops/troubleshooting/images/failed-task-large.png deleted file mode 100644 index 1c264c4..0000000 Binary files a/guide/ops/troubleshooting/images/failed-task-large.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/failed-task.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/failed-task.png b/guide/ops/troubleshooting/images/failed-task.png deleted file mode 100644 index 94a368e..0000000 Binary files a/guide/ops/troubleshooting/images/failed-task.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/jmx-sensors-all-large.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/jmx-sensors-all-large.png b/guide/ops/troubleshooting/images/jmx-sensors-all-large.png deleted file mode 100644 index d5d6b97..0000000 Binary files a/guide/ops/troubleshooting/images/jmx-sensors-all-large.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/jmx-sensors-all.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/jmx-sensors-all.png b/guide/ops/troubleshooting/images/jmx-sensors-all.png deleted file mode 100644 index 52c3191..0000000 Binary files a/guide/ops/troubleshooting/images/jmx-sensors-all.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/jmx-sensors-large.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/jmx-sensors-large.png b/guide/ops/troubleshooting/images/jmx-sensors-large.png deleted file mode 100644 index d9322c6..0000000 Binary files a/guide/ops/troubleshooting/images/jmx-sensors-large.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/jmx-sensors.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/jmx-sensors.png b/guide/ops/troubleshooting/images/jmx-sensors.png deleted file mode 100644 index ff81806..0000000 Binary files a/guide/ops/troubleshooting/images/jmx-sensors.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/resource-exception-large.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/resource-exception-large.png b/guide/ops/troubleshooting/images/resource-exception-large.png deleted file mode 100644 index 9cff4ce..0000000 Binary files a/guide/ops/troubleshooting/images/resource-exception-large.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/resource-exception.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/resource-exception.png b/guide/ops/troubleshooting/images/resource-exception.png deleted file mode 100644 index 2fb792c..0000000 Binary files a/guide/ops/troubleshooting/images/resource-exception.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/script-failure-large.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/script-failure-large.png b/guide/ops/troubleshooting/images/script-failure-large.png deleted file mode 100644 index b36517c..0000000 Binary files a/guide/ops/troubleshooting/images/script-failure-large.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/images/script-failure.png ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/images/script-failure.png b/guide/ops/troubleshooting/images/script-failure.png deleted file mode 100644 index 2d59b6d..0000000 Binary files a/guide/ops/troubleshooting/images/script-failure.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/increase-entropy.md ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/increase-entropy.md b/guide/ops/troubleshooting/increase-entropy.md deleted file mode 100644 index 6fc6f8d..0000000 --- a/guide/ops/troubleshooting/increase-entropy.md +++ /dev/null @@ -1,81 +0,0 @@ ---- -layout: website-normal -title: Increase Entropy -toc: /guide/toc.json ---- - -### Checking entropy level - -A lack of entropy can cause random number generation to be extremely slow. -This results in tasks like ssh to also be extremely slow. -One can check the available entropy on a machine by running the command: - -{% highlight bash %} -cat /proc/sys/kernel/random/entropy_avail -{% endhighlight %} - -It should be a value above 2000. - -If you are installing Apache Brooklyn on a virtual machine, you may find that it has insufficient -entropy. You may need to increase the Linux kernel entropy in order to speed up the ssh connections -to the managed entities. You can install and configure `rng-tools`, or just use /dev/urandom`. - - -### Installing rng-tool - -If you are using a RHEL 6 based OS: - -{% highlight bash %} -sudo -i -yum -y -q install rng-tools -echo "EXTRAOPTIONS=\"-r /dev/urandom\"" | cat >> /etc/sysconfig/rngd -/etc/init.d/rngd start -{% endhighlight %} - -If you are using a RHEL 7 or a systemd based system: - -{% highlight bash %} -sudo yum -y -q install rng-tools - -# Configure rng to use /dev/urandom -# Change the "ExecStart" line to: -# ExecStart=/sbin/rngd -f -r /dev/urandom -sudo vi /etc/systemd/system/multi-user.target.wants/rngd.service - -sudo systemctl daemon-reload -sudo systemctl start rngd -{% endhighlight %} - -If you are using a Debian-based OS: - -{% highlight bash %} -sudo -i -apt-get -y install rng-tools -echo "HRNGDEVICE=/dev/urandom" | cat >> /etc/default/rng-tools -/etc/init.d/rng-tools start -{% endhighlight %} - - -### Using /dev/urandom - -You can also just `mv /dev/random` then create it again linked to `/dev/urandom`: - -{% highlight bash %} -sudo mv /dev/random /dev/random-real -sudo ln -s /dev/urandom /dev/random -{% endhighlight %} - -Notice! If you map `/dev/random` to use `/dev/urandom` you will need to restart the Apache Brooklyn java process in order for the change to take place. - - -### More Information - -The following links contain further information: - -* [haveged (another solution) and general info from Digital Ocean](https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged) -* for specific OSs: - * [for RHEL or CentOS](http://my.itwnik.com/how-to-increase-linux-kernel-entropy/) - * [for Ubuntu](http://www.howtoforge.com/helping-the-random-number-generator-to-gain-enough-entropy-with-rng-tools-debian-lenny) - * [for Alpine](https://wiki.alpinelinux.org/wiki/Entropy_and_randomness) - - http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/increase-system-resource-limits.md ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/increase-system-resource-limits.md b/guide/ops/troubleshooting/increase-system-resource-limits.md deleted file mode 100644 index 8ef721b..0000000 --- a/guide/ops/troubleshooting/increase-system-resource-limits.md +++ /dev/null @@ -1,39 +0,0 @@ ---- -layout: website-normal -title: Increase System Resource Limits -toc: /guide/toc.json ---- - -If you encounter the following error: - - Caused by: java.io.IOException: Too many open files - at java.io.UnixFileSystem.createFileExclusively(Native Method)[:1.8.0 - -Please check the limit for opened files `cat /proc/sys/fs/file-max` and increase it. -You can increase the maximum limit of opened files by setting `fs.file-max` in `/etc/sysctl.conf`. -and then running `sudo sysctl -p` to apply the changes. - - -If you encounter the error below, e.g. when running with many entities, please consider **increasing the ulimit**: - - java.lang.OutOfMemoryError: unable to create new native thread - -On the VM running Apache Brooklyn, it is recommended that nproc and nofile are reasonably high -(e.g. 16384 or higher; a value of 1024 is often the default). - -If you want to check the current limits run `ulimit -a`. Alternatively, if Brooklyn is run as a -different user (e.g. with user name "brooklyn"), then instead run `ulimit -a -u brooklyn`. - -For RHEL (and CentOS) distributions, you can increase the limits by running -`sudo vi /etc/security/limits.conf` and adding (if it is "brooklyn" user running Apache Brooklyn): - - brooklyn soft nproc 16384 - brooklyn hard nproc 16384 - brooklyn soft nofile 16384 - brooklyn hard nofile 16384 - -Generally you do not have to reboot to apply ulimit values. They are set per session. -So after you have the correct values, quit the ssh session and log back in. - -For more details, see one of the many posts such as -[http://tuxgen.blogspot.co.uk/2014/01/centosrhel-ulimit-and-maximum-number-of.html](http://tuxgen.blogspot.co.uk/2014/01/centosrhel-ulimit-and-maximum-number-of.html). http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/index.md ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/index.md b/guide/ops/troubleshooting/index.md deleted file mode 100644 index 331e267..0000000 --- a/guide/ops/troubleshooting/index.md +++ /dev/null @@ -1,18 +0,0 @@ ---- -title: Troubleshooting -layout: website-normal -children: -- { path: overview.md, title: Overview } -- { path: web-console-issues.md, title: Web Console Issues } -- { path: deployment.md, title: Deployment } -- { path: connectivity.md, title: Server Connectivity } -- { path: slow-unresponsive.md, title: Brooklyn Slow or Unresponsive } -- { path: increase-entropy.md, title: Increase Entropy } -- { path: increase-system-resource-limits.md, title: Increase System Resource Limits } -- { path: detailed-support-report.md, title: Detailed Support Report } -- { path: softwareprocess.md, title: SoftwareProcess Entities } -- { path: going-deep-in-java-and-logs.md, title: Going Deep in Java and Logs } -- { path: memory-usage.md, title: Monitoring Memory Usage } ---- - -{% include list-children.html %} http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/memory-usage.md ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/memory-usage.md b/guide/ops/troubleshooting/memory-usage.md deleted file mode 100644 index b7baf77..0000000 --- a/guide/ops/troubleshooting/memory-usage.md +++ /dev/null @@ -1,138 +0,0 @@ ---- -layout: website-normal -title: "Troubleshooting: Monitoring Memory Usage" -toc: /guide/toc.json ---- - -## Memory Usage - -Brooklyn tries to keep in memory as much history of its activity as possible, -for displaying through the UI, so it is normal for it to consume as much memory -as it can. It uses "soft references" so these objects will be cleared if needed, -but **it is not a sign of anything unusual if Brooklyn is using all its available memory**. - -The number of active tasks, CPU usage, thread counts, and -retention of soft reference objects are a much better indication of load. -This information can be found by looking in the log for lines containing -`brooklyn gc`, such as: - - 2016-09-16 16:19:43,337 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn gc (before) - using 910 MB / 3.76 GB memory; 98% soft-reference maybe retention (of 362); 35 threads; tasks: 0 active, 2 unfinished; 31 remembered, 1013 total submitted) - -The soft-reference figure is indicative, but the lower this is, the more -the JVM has decided to get rid of items that were desired to be kept but optional. -It only tracks some soft-references (those wrapped in `Maybe`), -and of course if there are many many such items the JVM will have to get rid -of some, so a lower figure does not necessarily mean a problem. -Typically however if there's no `OutOfMemoryError` (OOME) reported, -there's no problem. - - -## Problem Indicators and Resolutions - -Two things that *do* normally indicate a problem with memory are: - -* `OutOfMemoryError` exceptions being thrown -* Memory usage high *and* CPU high, where the CPU is spent doing full garbage collection - -One possible cause is the JVM doing a poorly-selected GC strategy, -as described in [Oracle Java bug 6912889](http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6912889). -This can be confirmed by running the "analyzing soft reference usage" technique below; -memory should shrink dramatically then increase until the problem recurs. -This can be fixed by passing `-XX:SoftRefLRUPolicyMSPerMB=1` to the JVM, -as described in [Brooklyn issue 375](https://issues.apache.org/jira/browse/BROOKLYN-375). - -Other common JVM options include `-Xms256m -Xmx1g` -(depending on JVM provider and version) to set the right balance of memory allocation. -In some cases a larger `-Xmx` value may simply be the fix -(but this should not be the case unless many or large blueprints are being used). - -If the problem is not with soft references but with real memory usage, -the culprit is likely a memory leak, typically in blueprint design. -An early warning of this situation is the "soft-reference maybe retention" level decreasing. -In these situations, follow the steps as described below for "Investigating Leaks". - - -## Analyzing Soft Reference Usage - -If you are concerned about memory usage, or doing evaluation on test environments, -the following method (in the Groovy console) can be invoked to force the system to -reclaim as much memory as possible, including *all* soft references: - - org.apache.brooklyn.util.javalang.MemoryUsageTracker.forceClearSoftReferences() - -In good situations, memory usage should return to a small level. -This call can be disruptive to the system however so use with care. - -The above method can also be configured to run automatically when memory usage -is detected to hit a certain level. That can be useful if external policies are -being used to warn on high memory usage, and you want to keep some headroom. -Many JVM authorities discourage interfering with its garbage collector, however, -so use with care and study the particular JVM you are using. -See the class `BrooklynGarbageCollector` for more information. - - -## Investigating Leaks - -If a memory leak is found, the first place to look should be the WARN/ERROR logs. -Many common causes of leaks, including as runaway tasks and cyclic dependent configuration, -will show their own log errors prior to the memory error. - -You should also note the task counts in the `brooklyn gc` messages described above, -and if there are an exceptional number of tasks or tasks are not clearing, -other log messages will describe what is happening, and the in-product task -view can indicate issues. - -Sometimes slow leaks can occur if blueprints do not clean up entities or locations. -These can be diagnosed by noting the number of files written to the persistence location, -if persistence is being used. Deploying then destroying a blueprint should not leave -anything behind in the persistence directory. - -Where problems have been encountered in the past, we have resolved them and/or -worked to improve logging and early identification. -Please report any issues so that we can improve this further. -In many cases we can also give advice on what other log `grep` patterns can be useful. - - -### Standard Java Techniques - -Useful standard Java techniques for tracking memory leaks include: - -* `jstack ` to see what tasks are running -* `jmap -histo:live ` to see what objects are using memory (see below) -* Memory profilers such as VisualVM or Eclipse MAT, either connected to a running system or - against a heap dump generated on an OOME - -More information is available on [the Oracle Java web site](https://docs.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/memleaks.html). - -Note that some of the above techniques will often include soft and weak references that are irrelevant -to the problem (and will be cleared on an OOME). Objects that may be cached in that way include: - -* `BasicConfigKey` (used for the web server and many blueprints) -* `DslComponent` and `*Task` (used for Brooklyn activities and dependent configuration) -* `jclouds` items including `ImageImpl` (to cache data on cloud service providers) - -On the other hand any of the above may also indicate a leak. -Taking snapshots after a `forceClearSoftReferences()` (above) invocation and comparing those -is one technique to filter out noise. Another is to wait until there is an OOME -and look just after, because that will clear all non-essential data from memory. -(The `forceClearSoftReferences()` actually works by triggering an OOME, in as safe -a way as possible.) - -If leaked items are found, a profiler will normally let you see their content -and walk backwards along their references to find out why they are being retained. - - -### Summary of Techniques - -The following sequence of techniques is a common approach to investigating and fixing memory issues: - -* Note the log lines about `brooklyn gc`, including memory and tasks -* Do not assume high memory usage alone is an error, as soft reference caches are deliberate; - use `forceClearSoftReferences()` to clear these -* Note any WARN/ERROR messages in the log -* Tune JVM memory allocation and GC -* Look for leaking locations or references by creating then destroying a blueprint -* Use standard JVM profilers -* Inform the Apache Brooklyn community - - http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/overview.md ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/overview.md b/guide/ops/troubleshooting/overview.md deleted file mode 100644 index ac00aeb..0000000 --- a/guide/ops/troubleshooting/overview.md +++ /dev/null @@ -1,116 +0,0 @@ ---- -layout: website-normal -title: Troubleshooting Overview -toc: /guide/toc.json ---- - -This guide describes sources of information for understanding when things go wrong. - -Whether you're customizing out-of-the-box blueprints, or developing your own custom blueprints, you will -inevitably have to deal with entity failure. Thankfully Brooklyn provides plenty of information to help -you locate and resolve any issues you may encounter. - - -## Web-console Runtime Error Information - -### Entity Hierarchy - -The Brooklyn web-console includes a tree view of the entities within an application. Errors within the -application are represented visually, showing a "fire" image on the entity. - -When an error causes an entire application to be unexpectedly down, the error is generally propagated to the -top-level entity - i.e. marking it as "on fire". To find the underlying error, one should expand the entity -hierarchy tree to find the specific entities that have actually failed. - - -### Entity's Error Status - -Many entities have some common sensors (i.e. attributes) that give details of the error status: - -* `service.isUp` (often referred to as "service up") is a boolean, saying whether the service is up. For many - software processes, this is inferred from whether the "service.notUp.indicators" is empty. It is also - possible for some entities to set this attribute directly. -* `service.notUp.indicators` is a map of errors. This often gives much more information than the single - `service.isUp` attribute. For example, there may be many health-check indicators for a component: - is the root URL reachable, it the management api reporting healthy, is the process running, etc. -* `service.problems` is a map of namespaced indicators of problems with a service. -* `service.state` is the actual state of the service - e.g. CREATED, STARTING, RUNNING, STOPPING, STOPPED, - DESTROYED and ON_FIRE. -* `service.state.expected` indicates the state the service is expected to be in (and when it transitioned to that). - For example, is the service expected to be starting, running, stopping, etc. - -These sensor values are shown in the "sensors" tab - see below. - - -### Sensors View - -The "Sensors" tab in the Brooklyn web-console shows the attribute values of a particular entity. -This gives lots of runtime information, including about the health of the entity - the -set of attributes will vary between different entity types. - -[![Sensors view in the Brooklyn debug console.](images/jmx-sensors.png)](images/jmx-sensors-large.png) - -Note that null (or not set) sensors are hidden by default. You can click on the `Show/hide empty records` -icon (highlighted in yellow above) to see these sensors as well. - -The sensors view is also tabulated. You can configure the numbers of sensors shown per page -(at the bottom). There is also a search bar (at the top) to filter the sensors shown. - - -### Activity View - -The activity view shows the tasks executed by a given entity. The top-level tasks are the effectors -(i.e. operations) invoked on that entity. This view allows one to drill into the task, to -see details of errors. - -Select the entity, and then click on the `Activities` tab. - -In the table showing the tasks, each row is a link - clicking on the row will drill into the details of that task, -including sub-tasks: - -[![Task failure error in the Brooklyn debug console.](images/failed-task.png)](images/failed-task-large.png) - -For ssh tasks, this allows one to drill down to see the env, stdin, stdout and stderr. That is, you can see the -commands executed (stdin) and environment variables (env), and the output from executing that (stdout and stderr). - -For tasks that did not fail, one can still drill into the tasks to see what was done. - -It's always worth looking at the Detailed Status section as sometimes that will give you the information you need. -For example, it can show the exception stack trace in the thread that was executing the task that failed. - - -## Log Files - -Brooklyn's logging is configurable, for the files created, the logging levels, etc. -See [Logging docs]({{ site.path.guide }}/ops/logging.html). - -With out-of-the-box logging, `brooklyn.info.log` and `brooklyn.debug.log` files are created. These are by default -rolling log files: when the log reaches a given size, it is compressed and a new log file is started. -Therefore check the timestamps of the log files to ensure you are looking in the correct file for the -time of your error. - -With out-of-the-box logging, info, warnings and errors are written to the `brooklyn.info.log` file. This gives -a summary of the important actions and errors. However, it does not contain full stacktraces for errors. - -To find the exception, we'll need to look in Brooklyn's debug log file. By default, the debug log file -is named `brooklyn.debug.log`. You can use your favourite tools for viewing large text files. - -One possible tool is `less`, e.g. `less brooklyn.debug.log`. We can quickly find the last exception -by navigating to the end of the log file (using `Shift-G`), then performing a reverse-lookup by typing `?Exception` -and pressing `Enter`. Sometimes an error results in multiple exceptions being logged (e.g. first for the -entity, then for the cluster, then for the app). If you know the text of the error message (e.g. copy-pasted -from the Activities view of the web-console) then one can search explicitly for that text. - -The `grep` command is also extremely helpful. Useful things to grep for include: - -* The entity id (see the "summary" tab of the entity in the web-console for the id). -* The entity type name (if there are only a small number of entities of that type). -* The VM IP address. -* A particular error message (e.g. copy-pasted from the Activities view of the web-console). -* The word WARN etc, such as `grep -E "WARN|ERROR" brooklyn.info.log`. - -Grep'ing for particular log messages is also useful. Some examples are shown below: - -* INFO: "Started application", "Stopping application" and "Stopped application" -* INFO: "Creating VM " -* DEBUG: "Finished VM " http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/slow-unresponsive.md ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/slow-unresponsive.md b/guide/ops/troubleshooting/slow-unresponsive.md deleted file mode 100644 index 6551759..0000000 --- a/guide/ops/troubleshooting/slow-unresponsive.md +++ /dev/null @@ -1,240 +0,0 @@ ---- -layout: website-normal -title: Brooklyn Slow or Unresponsive -toc: /guide/toc.json ---- - -There are many possible causes for a Brooklyn server becoming slow or unresponsive. This guide -describes some possible reasons, and some commands and tools that can help diagnose the problem. - -Possible reasons include: - -* CPU is max'ed out -* Memory usage is extremely high -* SSH'ing is very slow due (e.g. due to lack of entropy) -* Out of disk space - -See [Brooklyn Requirements]({{ site.path.guide }}/ops/requirements.html) for details of server -requirements. - - -## Machine Diagnostics - -The following commands will collect OS-level diagnostics about the machine, and about the Brooklyn -process. The commands below assume use of CentOS 6.x. Minor adjustments may be required for -other platforms. - - -#### OS and Machine Details - -To display system information, run: - -{% highlight bash %} -uname -a -{% endhighlight %} - -To show details of the CPU and memory available to the machine, run: - -{% highlight bash %} -cat /proc/cpuinfo -cat /proc/meminfo -{% endhighlight %} - - -#### User Limits - -To display information about user limits, run the command below (while logged in as the same user -who runs Brooklyn): - -{% highlight bash %} -ulimit -a -{% endhighlight %} - -If Brooklyn is run as a different user (e.g. with user name "adalovelace"), then instead run: - -{% highlight bash %} -ulimit -a -u adalovelace -{% endhighlight %} - -Of particular interest is the limit for "open files". - -See [Increase System Resource Limits]({{ site.path.guide }}/ops/troubleshooting/increase-system-resource-limits.html) -for more information. - - -#### Disk Space - -The command below will list the disk size for each partition, including the amount used and -available. If the Brooklyn base directory, persistence directory or logging directory are close -to 0% available, this can cause serious problems: - -{% highlight bash %} -df -h -{% endhighlight %} - - -#### CPU and Memory Usage - -To view the CPU and memory usage of all processes, and of the machine as a whole, one can use the -`top` command. This runs interactively, updating every few seconds. To collect the output once -(e.g. to share diagnostic information in a bug report), run: - -{% highlight bash %} -top -n 1 -b > top.txt -{% endhighlight %} - - -#### File and Network Usage - -To count the number of open files for the Brooklyn process (which includes open socket connections): - -{% highlight bash %} -BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin -BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java) -lsof -p $BROOKLYN_PID | wc -l -{% endhighlight %} - -To count (or view the number of "established" internet connections, run: - -{% highlight bash %} -netstat -an | grep ESTABLISHED | wc -l -{% endhighlight %} - - -#### Linux Kernel Entropy - -A lack of entropy can cause random number generation to be extremely slow. This can cause -tasks like ssh to also be extremely slow. See -[linux kernel entropy]({{ site.path.guide }}/ops/troubleshooting/increase-entropy.html) -for details of how to work around this. - - -## Process Diagnostics - -#### Thread and Memory Usage - -To get memory and thread usage for the Brooklyn (Java) process, two useful tools are `jstack` -and `jmap`. These require the "development kit" to also be installed -(e.g. `yum install java-1.8.0-openjdk-devel`). Some useful commands are shown below: - -{% highlight bash %} -BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin -BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java) - -jstack $BROOKLYN_PID -jmap -histo:live $BROOKLYN_PID -jmap -heap $BROOKLYN_PID -{% endhighlight %} - - -#### Runnable Threads - -The [jstack-active](https://github.com/apache/brooklyn-dist/blob/master/scripts/jstack-active.sh) -script is a convenient light-weight way to quickly see which threads of a running Brooklyn -server are attempting to consume the CPU. It filters the output of `jstack`, to show only the -"really-runnable" threads (as opposed to those that are blocked). - -{% highlight bash %} -BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin -BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java) - -curl -O https://raw.githubusercontent.com/apache/brooklyn-dist/master/scripts/jstack-active.sh - -jstack-active $BROOKLYN_PID -{% endhighlight %} - - -#### Profiling - -If an in-depth investigation of the CPU usage (and/or object creation) of a Brooklyn Server is -requiring, there are many profiling tools designed specifically for this purpose. These generally -require that the process be launched in such a way that a profiler can attach, which may not be -appropriate for a production server. - - -#### Remote Debugging - -If the Brooklyn Server was originally run to allow a remote debugger to connect (strongly -discouraged in production!), then this provides a convenient way to investigate why Brooklyn -is being slow or unresponsive. See the Debugging Tips in the -tip [Debugging Remote Brooklyn]({{ site.path.guide }}/dev/tips/debugging-remote-brooklyn.html) -and the [IDE docs]({{ site.path.guide }}/dev/env/ide/) for more information. - - -## Log Files - -Apache Brooklyn will by default create brooklyn.info.log and brooklyn.debug.log files. See the -[Logging]({{ site.path.guide }}/ops/logging.html) docs for more information. - -The following are useful log messages to search for (e.g. using `grep`). Note the wording of -these messages (or their very presence) may change in future version of Brooklyn. - - -#### Normal Logging - -The lines below are commonly logged, and can be useful to search for when finding the start of a section of logging. - -{% highlight text %} -2016-05-30 17:05:51,458 INFO o.a.b.l.BrooklynWebServer [main]: Started Brooklyn console at http://127.0.0.1:8081/, running classpath://brooklyn.war -2016-05-30 17:06:04,098 INFO o.a.b.c.m.h.HighAvailabilityManagerImpl [main]: Management node tF3GPvQ5 running as HA MASTER autodetected -2016-05-30 17:06:08,982 INFO o.a.b.c.m.r.InitialFullRebindIteration [brooklyn-execmanager-rvpnFTeL-0]: Rebinding from /home/compose/compose-amp-state/brooklyn-persisted-state/data for master rvpnFTeL... -2016-05-30 17:06:11,105 INFO o.a.b.c.m.r.RebindIteration [brooklyn-execmanager-rvpnFTeL-0]: Rebind complete (MASTER) in 2s: 19 apps, 54 entities, 50 locations, 46 policies, 704 enrichers, 0 feeds, 160 catalog items -{% endhighlight %} - - -#### Memory Usage - -The debug log includes (every minute) a log statement about the memory usage and task activity. For example: - -{% highlight text %} -2016-05-27 12:20:19,395 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn gc (before) - using 328 MB / 496 MB memory (5.58 kB soft); 69 threads; storage: {datagrid={size=7, createCount=7}, refsMapSize=0, listsMapSize=0}; tasks: 10 active, 33 unfinished; 78 remembered, 1696906 total submitted) -2016-05-27 12:20:19,395 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn gc (after) - using 328 MB / 496 MB memory (5.58 kB soft); 69 threads; storage: {datagrid={size=7, createCount=7}, refsMapSize=0, listsMapSize=0}; tasks: 10 active, 33 unfinished; 78 remembered, 1696906 total submitted) -{% endhighlight %} - -These can be extremely useful if investigating a memory or thread leak, or to determine whether a -surprisingly high number of tasks are being executed. - - -#### Subscriptions - -One source of high CPU in Brooklyn is when a subscription (e.g. for a policy or enricher) is being -triggered many times (i.e. handling many events). A log message like that below will be logged on -every 1000 events handled by a given single subscription. - -{% highlight text %} -2016-05-30 17:29:09,125 DEBUG o.a.b.c.m.i.LocalSubscriptionManager [brooklyn-execmanager-rvpnFTeL-8]: 1000 events for subscriber Subscription[SCFnav9g;CanopyComposeApp{id=gIeTwhU2}@gIeTwhU2:webapp.url] -{% endhighlight %} - -If a subscription is handling a huge number of events, there are a couple of common reasons: -* first, it could be subscribing to too much activity - e.g. a wildcard subscription for all - events from all entities. -* second it could be an infinite loop (e.g. where an enricher responds to a sensor-changed event - by setting that same sensor, thus triggering another sensor-changed event). - - -#### User Activity - -All activity triggered by the REST API or web-console will be logged. Some examples are shown below: - -{% highlight text %} -2016-05-19 17:52:30,150 INFO o.a.b.r.r.ApplicationResource [brooklyn-jetty-server-8081-qtp1058726153-17473]: Launched from YAML: name: My Example App -location: aws-ec2:us-east-1 -services: -- type: org.apache.brooklyn.entity.webapp.tomcat.TomcatServer - -2016-05-30 14:46:19,516 DEBUG brooklyn.REST [brooklyn-jetty-server-8081-qtp1104967201-20881]: Request Tisj14 starting: POST /v1/applications/NiBy0v8Q/entities/NiBy0v8Q/expunge from 77.70.102.66 -{% endhighlight %} - - -#### Entity Activity - -If investigating the behaviour of a particular entity (e.g. on failure), it can be very useful to -`grep` the info and debug log for the entity's id. For a software process, the debug log will -include the stdout and stderr of all the commands executed by that entity. - -It can also be very useful to search for all effector invocations, to see where the behaviour -has been triggered: - -{% highlight text %} -2016-05-27 12:45:43,529 DEBUG o.a.b.c.m.i.EffectorUtils [brooklyn-execmanager-gvP7MuZF-14364]: Invoking effector stop on TomcatServerImpl{id=mPujYmPd} -{% endhighlight %} http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/softwareprocess.md ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/softwareprocess.md b/guide/ops/troubleshooting/softwareprocess.md deleted file mode 100644 index 85ab2c0..0000000 --- a/guide/ops/troubleshooting/softwareprocess.md +++ /dev/null @@ -1,51 +0,0 @@ ---- -layout: website-normal -title: Troubleshooting SoftwareProcess Entities -toc: /guide/toc.json ---- - -The [troubleshooting overview](overview.html) in Brooklyn gives -information for how to find more information about errors. - -If that doesn't give enough information to diagnose, fix or workaround the problem, then it can be required -to login to the machine, to investigate further. This guide applies to entities that are types -of "SoftwareProcess" in Brooklyn, or that follows those conventions. - - -## VM connection details - -The ssh connection details for an entity is published to a sensor `host.sshAddress`. The login -credentials will depend on the Brooklyn configuration. The default is to use the `~/.ssh/id_rsa` -or `~/.ssh/id_dsa` on the Brooklyn host (uploading the associated `~/.ssh/id_rsa.pub` to the machine's -authorised_keys). However, this can be overridden (e.g. with specific passwords etc) in the -location's configuration. - -For Windows, there is a similar sensor with the name `host.winrmAddress`. - - - -## Install and Run Directories - -For ssh-based software processes, the install directory and the run directory are published as sensors -`install.dir` and `run.dir` respectively. - -For some entities, files are unpacked into the install dir; configuration files are written to the -run dir along with log files. For some other entities, these directories may be mostly empty - -e.g. if installing RPMs, and that software writes its logs to a different standard location. - -Most entities have a sensor `log.location`. It is generally worth checking this, along with other files -in the run directory (such as console output). - - -## Process and OS Health - -It is worth checking that the process is running, e.g. using `ps aux` to look for the desired process. -Some entities also write the pid of the process to `pid.txt` in the run directory. - -It is also worth checking if the required port is accessible. This is discussed in the guide -"Troubleshooting Server Connectivity Issues in the Cloud", including listing the ports in use: -execute `netstat -antp` (or on OS X `netstat -antp TCP`) to list the TCP ports in use (or use -`-anup` for UDP). - -It is also worth checking the disk space on the server, e.g. using `df -m`, to check that there -is sufficient space on each of the required partitions. http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/troubleshooting/web-console-issues.md ---------------------------------------------------------------------- diff --git a/guide/ops/troubleshooting/web-console-issues.md b/guide/ops/troubleshooting/web-console-issues.md deleted file mode 100644 index 56c0ddc..0000000 --- a/guide/ops/troubleshooting/web-console-issues.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -layout: website-normal -title: Web Console Issues -toc: /guide/toc.json ---- - -## Page Does Not Load in Chrome, Saying ""Waiting for available socket..." - -If you find that the Web Console does not load in Chrome (giving a message "Waiting for available -socket..."), there are two possible explanations. - -The first reason is that another tab for the same host:port has a login dialog that is prompting -for a username and password. This will block other tabs that are also trying to connect. The -solution is to login at the first tab, or to close that tab. - -A second possible reason is that there are too many open connections in Chrome to that domain. -There is a limit in Chrome for the number of open socket connections to a given domain. If this -is exceeded, subsequent tabs that try to connect will wait for an available socket. - -For more information, see -[http://stackoverflow.com/questions/23679968/chrome-hangs-after-certain-amount-of-data-transfered-waiting-for-available-soc](http://stackoverflow.com/questions/23679968/chrome-hangs-after-certain-amount-of-data-transfered-waiting-for-available-soc). - -[chrome://net-internals/#sockets](chrome://net-internals/#sockets) is also a useful diagnostic tool. http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/ops/upgrade.md ---------------------------------------------------------------------- diff --git a/guide/ops/upgrade.md b/guide/ops/upgrade.md deleted file mode 100644 index d24465b..0000000 --- a/guide/ops/upgrade.md +++ /dev/null @@ -1,355 +0,0 @@ ---- -title: Upgrade -layout: website-normal ---- - -This guide provides all necessary information to upgrade Apache Brooklyn for both the RPM/DEB and Tarball packages. - -## Backwards Compatibility - -Apache Brooklyn version 0.12.0 onward runs primarily inside a Karaf container. When upgrading from 0.11.0 or below, -this update changes the mechanisms for launching Brooklyn. -This will impact any custom scripting around the launching of Brooklyn, and the supplying of command line arguments. - -Use of the `lib/dropins` and `lib/patch` folders will no longer work (because Karaf does not support that kind of classloading). -Instead, code must be built and installed as [OSGi bundles](https://en.wikipedia.org/wiki/OSGi#Bundles). - -## Upgrading - -* Use of RPM and DEB is now recommended where possible, rather than the tar.gz. This entirely replaces the previous install. - -* CentOS 7.x is recommended over CentOS 6.x (note: the RPM **will not work** on CentOS 6.x) - -### Upgrade from Apache Brooklyn 0.12.0 onward - -{::options parse_block_html="true" /} - - - -
-
- -1. **Important!** Backup persisted state and custom configuration, in case you need to rollback to a previous version. - - 1. By default, persisted state is located at `/var/lib/brooklyn`. - The `persistenceDir` and `persistenceLocation` are configured in the file `/etc/brooklyn/org.apache.brooklyn.osgilauncher.cfg`. - The persistence details will be logged in `/var/log/brooklyn/brooklyn.info.log` at startup time. - - 2. Configuration files are in `/etc/brooklyn`. - -2. Upgrade Apache Brooklyn: - - 1. [Download](../misc/download.html) the new RPM/DEB package - - 2. Upgrade Apache Brooklyn: - - # CentOS / RHEL - sudo yum upgrade apache-brooklyn-xxxx.noarch.rpm - - # Ubuntu / Debian - sudo dpkg -i apache-brooklyn-xxxx.deb - - If there are conflicts in configuration files (located in `/etc/brooklyn`), the upgrade will behave differently based - on the package you are using: - - * RPM: the upgrade will keep the previously installed one and save the new version, with the suffix `.rpmsave`. - You will then need to check and manually resolve those. - * DEB: the upgrade will ask you what to do. - -3. Start Apache Brooklyn: - - # CentOS 7 / RHEL - sudo systemctl start brooklyn - # CentOS 6 and older - sudo initctl start brooklyn - - # Ubuntu / Debian - start brooklyn - - Wait for Brooklyn to be running (i.e. its web-console is responsive) - -
- -
- -1. Stop Apache Brooklyn: - - ./bin/stop brooklyn - - If this does not stop it within a few seconds (as checked with `sudo ps aux | grep karaf`), then use `sudo kill ` - -2. **Important!** Backup persisted state and custom configuration. - - 1. By default, persisted state is located at `~/.brooklyn/brooklyn-persisted-state`. - The `persistenceDir` and `persistenceLocation` are configured in the file `./etc/org.apache.brooklyn.osgilauncher.cfg`. - The persistence details will be logged in `./log/brooklyn.info.log` at startup time. - - 2. Configuration files are in `./etc/`. - Any changes to these configuration files will need to be re-applied after reinstalling Brooklyn. - -3. Install new version of Apache Brooklyn: - - 1. [Download](../misc/download.html) the new tarball zip package. - - 2. Install Brooklyn: - - tar -zxf apache-brooklyn-xxxx.tar.gz - cd apache-brooklyn-xxxx - -4. Restore any changes to the configuration files (see step 2). - -5. Validate that the new release works, by starting in "HOT_BACKUP" mode. - - 1. Before starting Brooklyn, reconfigure `./etc/org.apache.brooklyn.osgilauncher.cfg` and set `highAvailabilityMode=HOT_BACKUP`. - This way when Brooklyn is started, it will only read and validate the persisted state and will not write into it. - - 2. Start Apache Brooklyn: - - ./bin/start brooklyn - - 3. Check whether you have rebind ERROR messages in `./log/brooklyn.info.log`, e.g. `sudo grep -E "WARN|ERROR" /opt/brooklyn/log/brooklyn.debug.log`. - If you do not have such errors you can proceed. - - 4. Stop Apache Brooklyn: - - ./bin/stop brooklyn - - 5. Change the `highAvailabilityMode` to the default (AUTO) by commenting it out in `./etc/org.apache.brooklyn.osgilauncher.cfg`. - -6. Start Apache Brooklyn: - - ./bin/start brooklyn - - Wait for Brooklyn to be running (i.e. its web-console is responsive). - -7. Update the catalog, using the br command: - - 1. [Download](https://brooklyn.apache.org/download/index.html#command-line-client) the br tool. - - 2. Login with br: `br login http://localhost:8081 `. - - 3. Update the catalog: `br catalog add /opt/brooklyn/catalog/catalog.bom`. - -
-
- -### Upgrade from Apache Brooklyn 0.11.0 and below - - - -
-
- -1. Stop Apache Brooklyn: - - # CentOS 7 / RHEL - sudo systemctl stop brooklyn - # CentOS6 and older - sudo initctl stop brooklyn - - # Ubuntu / Debian - stop brooklyn - - If this does not stop it within a few seconds (as checked with `sudo ps aux | grep brooklyn`), then use `sudo kill `. - -2. **Important!** Backup persisted state and custom configuration. - - 1. By default, persisted state is located at `/opt/brooklyn/.brooklyn/`.. - The `persistenceDir` and `persistenceLocation` are configured in the file `./etc/org.apache.brooklyn.osgilauncher.cfg`. - The persistence details will be logged in `./log/brooklyn.info.log` at startup time. - - 2. Configuration files are in `./etc/`. - Any changes to these configuration files will need to be re-applied after reinstalling Brooklyn. - -3. Delete the existing Apache Brooklyn install: - - 1. Remove Brooklyn package: - - # CentOS / RHEL - sudo yum erase apache-brooklyn - - # Ubuntu / Debian - sudo dpkg -r apache-brooklyn - - 2. On CentOS 7 run `sudo systemctl daemon-reload`. - - 3. Confirm that Brooklyn is definitely not running (see step 1 above). - - 4. Delete the Brooklyn install directory: `sudo rm -r /opt/brooklyn` as well as the Brooklyn log directory: - `sudo rm -r /var/log/brooklyn/` - -4. Make sure you have Java 8. - By default CentOS images come with JRE6 which is incompatible version for Brooklyn. - If CentOS is prior to 6.8 upgrade nss: `yum -y upgrade nss` - -5. Install new version of Apache Brooklyn: - - 1. [Download](../misc/download.html) the new RPM/DEB package. - - 2. Install Apache Brooklyn: - - # CentOS / RHEL - sudo yum install apache-brooklyn-xxxx.noarch.rpm - - # Ubuntu / Debian - sudo dpkg -i apache-brooklyn-xxxx.deb - -6. Restore the persisted state and configuration. - - 1. Stop the Brooklyn service: - - # CentOS 7 / RHEL - sudo systemctl stop brooklyn - # CentOS 6 and older - sudo initctl stop brooklyn - - # Ubuntu / Debian - stop brooklyn - - Confirm that Brooklyn is no longer running (see step 1). - - 2. Restore the persisted state directory into `/var/lib/brooklyn` and any changes to the configuration files (see step 2). - Ensure owner/permissions are correct for the persisted state directory, e.g.: - `sudo chown -r brooklyn:brooklyn /var/lib/brooklyn` - -7. Validate that the new release works, by starting in "HOT_BACKUP" mode. - - 1. Before starting Brooklyn, reconfigure `/etc/brooklyn/org.apache.brooklyn.osgilauncher.cfg` and set `highAvailabilityMode=HOT_BACKUP`. - This way when Brooklyn is started, it will only read and validate the persisted state and will not write into it. - - 2. Start Apache Brooklyn: - - # CentOS 7 / RHEL - sudo systemctl start brooklyn - # CentOS 6 and older - sudo initctl start brooklyn - - # Ubuntu / Debian - start brooklyn - - 3. Check whether you have rebind ERROR messages in the Brooklyn logs, e.g. `sudo grep -E "Rebind|WARN|ERROR" /var/log/brooklyn/brooklyn.debug.log`. - If you do not have such errors you can proceed. - - 4. Stop Brooklyn: - - # CentOS 7 / RHEL - sudo systemctl stop brooklyn - # CentOS 6 and older - sudo initctl stop brooklyn - - # Ubuntu / Debian - stop brooklyn - - 5. Change the `highAvailabilityMode` to the default (AUTO) by commenting it out in `./etc/org.apache.brooklyn.osgilauncher.cfg`. - -8. Start Apache Brooklyn: - - # CentOS 7 / RHEL - sudo systemctl start brooklyn - # CentOS 6 and older - sudo initctl start brooklyn - - # Ubuntu / Debian - start brooklyn - - Wait for Brooklyn to be running (i.e. its web-console is responsive). - -9. Update the catalog, using the br command: - - 1. Download the br tool (i.e. from the "CLI Download" link in the web-console). - - 2. Login with br: `br login http://localhost:8081 `. - - 3. Update the catalog: `br catalog add /opt/brooklyn/catalog/catalog.bom`. - -
- -
- -Same instructions as above. - -
-
- -## Rollback - -This section applies only with you are using the RPM/DEB packages. To perform a rollback, please follow these instructions: - -{% highlight bash %} -# CentOS / RHEL -yum downgrade apache-brooklyn.noarch - -# Ubuntu Debian -dpkg -i apache-brooklyn-xxxx.deb -{% endhighlight %} - -*Note that to downgrade a DEB package is essentially installing a previous version therefore you need to [download](../misc/download.html) -the version you want to downgrade to before hand.* - -## How to stop your service - -On systemd: -{% highlight bash %} -systemctl stop brooklyn -{% endhighlight %} - -On upstart: -{% highlight bash %} -stop brooklyn -{% endhighlight %} - -## Web login credentials - -* User credentials should now be recorded in [`brooklyn.cfg`](paths.html). - -* Brooklyn will still read them from both [`brooklyn.cfg`](paths.html) and `~/.brooklyn/brooklyn.properties`. - -* Configure a username/password by modifying [`brooklyn.cfg`](paths.html). An example entry is: - -{% highlight bash %} -brooklyn.webconsole.security.users=admin -brooklyn.webconsole.security.user.admin.password=password2 -{% endhighlight %} - -## Persistence - -If you have persisted state you wish to rebind to, persistence is now configured in the following files: - -* [`brooklyn.cfg`](paths.html) -* [`org.apache.brooklyn.osgilauncher.cfg`](paths.html) - -For example, to use S3 for the persisted state, add the following to [`brooklyn.cfg`](paths.html): - -{% highlight bash %} -brooklyn.location.named.aws-s3-eu-west-1:aws-s3:eu-west-1 -brooklyn.location.named.aws-s3-eu-west-1.identity= -brooklyn.location.named.aws-s3-eu-west-1.credential= -{% endhighlight %} - -To continue the S3 example, for the persisted state, add the following to [`org.apache.brooklyn.osgilauncher.cfg`](paths.html): - -{% highlight bash %} -persistenceLocation=aws-s3-eu-west-1 -persistenceDir= -{% endhighlight %} - -Apache Brooklyn should be stopped before this file is modified, and then restarted with the new configuration. - -***Note that you can not store the credentials (for e.g. aws-s3-eu-west-1) in the catalog because that catalog is stored -in the persisted state. Apache Brooklyn needs to know it in order to read the persisted state at startup time.*** - -If binding to existing persisted state, an additional command is required to update the existing catalog with the Brooklyn -0.12.0 versions. Assuming Brooklyn has been installed to [`/opt/brooklyn`](paths.html) (as is done by the RPM and DEB): - - {% highlight bash %} - br catalog add /opt/brooklyn/catalog/catalog.bom - {% endhighlight %} - -All existing custom jars previously added to lib/plugins (e.g. for Java-based entities) need to be converted to OSGi bundles, -and installed in Karaf. The use of the "brooklyn.libraries" section in catalog.bom files will continue to work. http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/start/_my-web-cluster.yaml ---------------------------------------------------------------------- diff --git a/guide/start/_my-web-cluster.yaml b/guide/start/_my-web-cluster.yaml deleted file mode 100644 index a00949e..0000000 --- a/guide/start/_my-web-cluster.yaml +++ /dev/null @@ -1,24 +0,0 @@ -name: My Web Cluster - -location: - jclouds:aws-ec2: - identity: ABCDEFGHIJKLMNOPQRST - credential: s3cr3tsq1rr3ls3cr3tsq1rr3ls3cr3tsq1rr3l - -services: -- type: org.apache.brooklyn.entity.webapp.ControlledDynamicWebAppCluster - name: My Web - id: webappcluster - brooklyn.config: - wars.root: http://search.maven.org/remotecontent?filepath=org/apache/brooklyn/example/brooklyn-example-hello-world-sql-webapp/0.8.0-incubating/brooklyn-example-hello-world-sql-webapp-0.8.0-incubating.war - java.sysprops: - brooklyn.example.db.url: > - $brooklyn:formatString("jdbc:%s%s?user=%s&password=%s", - component("db").attributeWhenReady("datastore.url"), - "visitors", "brooklyn", $brooklyn:external("brooklyn-demo-sample", "hidden-brooklyn-password")) -- type: org.apache.brooklyn.entity.database.mysql.MySqlNode - name: My DB - id: db - brooklyn.config: - creation.script.password: $brooklyn:external("brooklyn-demo-sample", "hidden-brooklyn-password") - datastore.creation.script.url: https://bit.ly/brooklyn-visitors-creation-script \ No newline at end of file http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/58bb3aa0/guide/start/_my-web-cluster2.yaml ---------------------------------------------------------------------- diff --git a/guide/start/_my-web-cluster2.yaml b/guide/start/_my-web-cluster2.yaml deleted file mode 100644 index bef8d1a..0000000 --- a/guide/start/_my-web-cluster2.yaml +++ /dev/null @@ -1,32 +0,0 @@ -name: My Web Cluster - -location: localhost - -services: - -- type: org.apache.brooklyn.entity.webapp.ControlledDynamicWebAppCluster - name: My Web - brooklyn.config: - wars.root: http://search.maven.org/remotecontent?filepath=org/apache/brooklyn/example/brooklyn-example-hello-world-sql-webapp/0.8.0-incubating/brooklyn-example-hello-world-sql-webapp-0.8.0-incubating.war - java.sysprops: - brooklyn.example.db.url: > - $brooklyn:formatString("jdbc:%s%s?user=%s&password=%s", - component("db").attributeWhenReady("datastore.url"), - "visitors", "brooklyn", $brooklyn:external("brooklyn-demo-sample", "hidden-brooklyn-password")) - brooklyn.policies: - - type: org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy - brooklyn.config: - metric: webapp.reqs.perSec.windowed.perNode - metricLowerBound: 0.1 - metricUpperBound: 10 - minPoolSize: 1 - maxPoolSize: 4 - resizeUpStabilizationDelay: 10s - resizeDownStabilizationDelay: 1m - -- type: org.apache.brooklyn.entity.database.mysql.MySqlNode - id: db - name: My DB - brooklyn.config: - creation.script.password: $brooklyn:external("brooklyn-demo-sample", "hidden-brooklyn-password") - datastore.creation.script.url: https://bit.ly/brooklyn-visitors-creation-script