brooklyn-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aleds...@apache.org
Subject [3/4] incubator-brooklyn git commit: rejig titles of troubleshooting sections
Date Tue, 28 Jul 2015 15:45:22 GMT
rejig titles of troubleshooting sections


Project: http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/commit/3d930107
Tree: http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/tree/3d930107
Diff: http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/diff/3d930107

Branch: refs/heads/master
Commit: 3d930107c6c1dceeb1f41c00f56657d24590f1a7
Parents: 3d29c8b
Author: Alex Heneveld <alex.heneveld@cloudsoftcorp.com>
Authored: Fri Jul 24 15:18:51 2015 +0100
Committer: Alex Heneveld <alex.heneveld@cloudsoftcorp.com>
Committed: Tue Jul 28 12:33:56 2015 +0100

----------------------------------------------------------------------
 .../dev/tips/troubleshooting-exceptions.md      | 489 -------------------
 docs/guide/ops/troubleshooting/connectivity.md  | 143 ++++++
 docs/guide/ops/troubleshooting/deployment.md    |  88 ++++
 .../going-deep-in-java-and-logs.md              | 488 ++++++++++++++++++
 docs/guide/ops/troubleshooting/index.md         |  11 +-
 docs/guide/ops/troubleshooting/overview.md      | 116 +++++
 .../ops/troubleshooting/softwareprocess.md      |  50 ++
 .../troubleshooting-connectivity.md             | 143 ------
 .../troubleshooting-deployment.md               |  88 ----
 .../troubleshooting-runtime-errors.md           | 116 -----
 .../troubleshooting-softwareprocess.md          |  50 --
 11 files changed, 891 insertions(+), 891 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/dev/tips/troubleshooting-exceptions.md
----------------------------------------------------------------------
diff --git a/docs/guide/dev/tips/troubleshooting-exceptions.md b/docs/guide/dev/tips/troubleshooting-exceptions.md
deleted file mode 100644
index fb3d849..0000000
--- a/docs/guide/dev/tips/troubleshooting-exceptions.md
+++ /dev/null
@@ -1,489 +0,0 @@
----
-layout: website-normal
-title: Troubleshooting Exceptions and Node Failure
-toc: /guide/toc.json
----
-
-Whether you're customizing out-of-the-box blueprints, or developing your own custom blueprints, you will
-inevitably have to deal with node failure, or exceptions being thrown by your node. Thankfully Brooklyn
-provides plenty of information to help you locate and resolve any issues you may encounter.
-
-This guide looks at three common failure scenarios and describes the steps that can be taken to
-identify the issue.
-
-## Script failure
-Many blueprints run bash scripts as part of the installation. This section highlights how to identify a problem with
-a bash script.
-
-First let's take a look at the `customize()` method of the Tomcat server blueprint:
-
-{% highlight java %}
-  @Override
-  public void customize() {
-      newScript(CUSTOMIZING)
-          .body.append("mkdir -p conf logs webapps temp")
-          .failOnNonZeroResultCode()
-          .execute();
-
-      copyTemplate(entity.getConfig(TomcatServer.SERVER_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "server.xml"));
-      copyTemplate(entity.getConfig(TomcatServer.WEB_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "web.xml"));
-
-      if (isProtocolEnabled("HTTPS")) {
-          String keystoreUrl = Preconditions.checkNotNull(getSslKeystoreUrl(), "keystore URL must be specified if using HTTPS for " + entity);
-          String destinationSslKeystoreFile = getHttpsSslKeystoreFile();
-          InputStream keystoreStream = resource.getResourceFromUrl(keystoreUrl);
-          getMachine().copyTo(keystoreStream, destinationSslKeystoreFile);
-      }
-
-      getEntity().deployInitialWars();
-  }
-{% endhighlight %}
-
-Here we can see that it's running a script to create four directories before continuing with the customization. Let's
-introduce an error by changing `mkdir` to `mkrid`:
-
-{% highlight java %}
-      newScript(CUSTOMIZING)
-          .body.append("mkrid -p conf logs webapps temp") // `mkdir` changed to `mkrid`
-          .failOnNonZeroResultCode()
-          .execute();
-{% endhighlight %}
-
-Now let's try deploying this using the following YAML:
-
-{% highlight yaml %}
-
-name: Tomcat failure test
-location: localhost
-services:
-- type: brooklyn.entity.webapp.tomcat.TomcatServer
-
-{% endhighlight %}
-
-Shortly after deployment, the entity fails with the following error:
-
-`Failure running task ssh: customizing TomcatServerImpl{id=e1HP2s8x} (HmyPAozV): 
-Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}`
-
-[![Script failure error in the Brooklyn debug console.](images/script-failure.png)](images/script-failure-large.png)
-
-By selecting the `Activities` tab, we can drill into the task that failed. The list of tasks shown (where the 
-effectors are shown as top-level tasks) are clickable links. Selecting that row will show the details of
-that particular task, including its sub-tasks. We can eventually get to the specific sub-task that failed:
-
-[![Task failure error in the Brooklyn debug console.](images/failed-task.png)](images/failed-task-large.png)
-
-By clicking on the `stderr` link, we can see the script failed with the following error:
-
-{% highlight console %}
-/tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh: line 10: mkrid: command not found
-{% endhighlight %}
-
-This tells us *what* went wrong, but doesn't tell us *where*. In order to find that, we'll need to look at the
-stack trace that was logged when the exception was thrown.
-
-It's always worth looking at the Detailed Status section as sometimes this will give you the information you need.
-In this case, the stack trace is limited to the thread that was used to execute the task that ran the script:
-
-{% highlight console %}
-Failed after 40ms
-
-STDERR
-/tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh: line 10: mkrid: command not found
-
-
-STDOUT
-Executed /tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh, result 127: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
-
-java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
-    at brooklyn.entity.basic.lifecycle.ScriptHelper.logWithDetailsAndThrow(ScriptHelper.java:390)
-    at brooklyn.entity.basic.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:379)
-    at brooklyn.entity.basic.lifecycle.ScriptHelper$8.call(ScriptHelper.java:289)
-    at brooklyn.entity.basic.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287)
-    at brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343)
-    at brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
-    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
-    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
-    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
-at java.lang.Thread.run(Thread.java:745)
-{% endhighlight %}
-
-In order to find the exception, we'll need to look in Brooklyn's debug log file. By default, the debug log file
-is named `brooklyn.debug.log`. Usually the easiest way to navigate the log file is to use `less`, e.g.
-`less brooklyn.debug.log`. We can quickly find find the stack trace by first navigating to the end of the log file
-with `Shift-G`, then performing a reverse-lookup by typing `?Tomcat` and pressing `Enter`. If searching for the 
-blueprint type (in this case Tomcat) simply matches tasks unrelated to the exception, you can also search for 
-the text of the error message, in this case `? invalid result 127`. You can make the search case-insensitivity by
-typing `-i` before performing the search. To skip the current match and move to the next on (i.e. 'up' as we're
-performing a reverse-lookup), simply press `n`
-
-In this case, the `?Tomcat` search takes us directly to the full stack trace (Only the last part of the trace
-is shown here):
-
-{% highlight console %}
-
-at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63) ~[guava-17.0.jar:na]
-at brooklyn.util.task.BasicTask.get(BasicTask.java:343) ~[classes/:na]
-at brooklyn.util.task.BasicTask.getUnchecked(BasicTask.java:352) ~[classes/:na]
-... 9 common frames omitted
-Caused by: brooklyn.util.exceptions.PropagatedRuntimeException: 
-at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:97) ~[classes/:na]
-at brooklyn.util.task.BasicTask.getUnchecked(BasicTask.java:354) ~[classes/:na]
-at brooklyn.entity.basic.lifecycle.ScriptHelper.execute(ScriptHelper.java:339) ~[classes/:na]
-at brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:72) ~[classes/:na]
-at brooklyn.entity.basic.AbstractSoftwareProcessDriver$8.run(AbstractSoftwareProcessDriver.java:150) ~[classes/:na]
-at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_71]
-at brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343) ~[classes/:na]
-... 5 common frames omitted
-Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
-at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_71]
-at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_71]
-at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63) ~[guava-17.0.jar:na]
-at brooklyn.util.task.BasicTask.get(BasicTask.java:343) ~[classes/:na]
-at brooklyn.util.task.BasicTask.getUnchecked(BasicTask.java:352) ~[classes/:na]
-... 10 common frames omitted
-Caused by: java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
-at brooklyn.entity.basic.lifecycle.ScriptHelper.logWithDetailsAndThrow(ScriptHelper.java:390) ~[classes/:na]
-at brooklyn.entity.basic.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:379) ~[classes/:na]
-at brooklyn.entity.basic.lifecycle.ScriptHelper$8.call(ScriptHelper.java:289) ~[classes/:na]
-at brooklyn.entity.basic.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287) ~[classes/:na]
-... 6 common frames omitted
-
-{% endhighlight %}
-
-Brooklyn's use of tasks and helper classes can make the stack trace a little harder than usual to follow, but a good
-place to start is to look through the stack trace for the node's implementation or ssh driver classes (usually
-named `FooNodeImpl` or `FooSshDriver`). In this case we can see the following:
-
-{% highlight console %}
-at brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:72) ~[classes/:na]
-{% endhighlight %}
-
-Combining this with the error message of `mkrid: command not found` we can see that indeed `mkdir` has been
-misspelled `mkrid` on line 72 of `TomcatSshDriver.java`.
-
-
-## Non-Script failure
-The section above gives an example of a failure that occurs when a script is run. In this section we will look at
-a failure in a non-script related part of the code. We'll use the `customize()` method of the Tomcat server again,
-but this time, we'll correct the spelling of 'mkdir' and add a line that attempts to copy a nonexistent resource 
-to the remote server:
-
-{% highlight java %}
-
-newScript(CUSTOMIZING)
-    .body.append("mkdir -p conf logs webapps temp")
-    .failOnNonZeroResultCode()
-    .execute();
-
-copyTemplate(entity.getConfig(TomcatServer.SERVER_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "server.xml"));
-copyTemplate(entity.getConfig(TomcatServer.WEB_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "web.xml"));
-copyTemplate("classpath://nonexistent.xml", Os.mergePaths(getRunDir(), "conf", "nonexistent.xml")); // Resource does not exist!
-
-{% endhighlight %}
-
-Let's deploy this using the same YAML from above. Here's the resulting error in the Brooklyn debug console:
-
-[![Resource exception in the Brooklyn debug console.](images/resource-exception.png)](images/resource-exception-large.png)
-
-Again, this tells us *what* the error is, but we need to find *where* the code is that attempts to copy this file. In
-this case it's shown in the Detailed Status section, and we don't need to go to the log file:
-
-{% highlight console %}
-
-Failed after 221ms: Error getting resource 'classpath://nonexistent.xml' for TomcatServerImpl{id=PVZxDKU1}: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath
-
-java.lang.RuntimeException: Error getting resource 'classpath://nonexistent.xml' for TomcatServerImpl{id=PVZxDKU1}: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath
-    at brooklyn.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:297)
-    at brooklyn.util.ResourceUtils.getResourceAsString(ResourceUtils.java:475)
-    at brooklyn.entity.basic.AbstractSoftwareProcessDriver.getResourceAsString(AbstractSoftwareProcessDriver.java:447)
-    at brooklyn.entity.basic.AbstractSoftwareProcessDriver.processTemplate(AbstractSoftwareProcessDriver.java:469)
-    at brooklyn.entity.basic.AbstractSoftwareProcessDriver.copyTemplate(AbstractSoftwareProcessDriver.java:390)
-    at brooklyn.entity.basic.AbstractSoftwareProcessDriver.copyTemplate(AbstractSoftwareProcessDriver.java:379)
-    at brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:79)
-    at brooklyn.entity.basic.AbstractSoftwareProcessDriver$8.run(AbstractSoftwareProcessDriver.java:150)
-    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
-    at brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343)
-    at brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
-    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
-    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
-    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
-at java.lang.Thread.run(Thread.java:745)
-    Caused by: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath
-at brooklyn.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:233)
-    ... 14 more
-    Caused by: java.io.IOException: nonexistent.xml not found on classpath
-    at brooklyn.util.ResourceUtils.getResourceViaClasspath(ResourceUtils.java:372)
-at brooklyn.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:230)
-    ... 14 more
-
-{% endhighlight %}
-
-Looking for `Tomcat` in the stack trace, we can see in this case the problem lies at line 79 of `TomcatSshDriver.java`
-
-
-## External failure
-
-Sometimes an entity will fail outside the direct commands issues by Brooklyn. When installing and launching an entity,
-Brooklyn will check the return code of scripts that were run to ensure that they completed successfully (i.e. the
-return code of the script is zero). It is possible, for example, that a launch script completes successfully, but
-the entity fails to start.
-
-We can simulate this type of failure by launching Tomcat with an invalid configuration file. As seen in the previous
-examples, Brooklyn copies two xml configuration files to the server: `server.xml` and `web.xml`
-
-The first few non-comment lines of `server.xml` are as follows (you can see the full file [here]
-(https://github.com/apache/incubator-brooklyn/blob/master/software/webapp/src/main/resources/brooklyn/entity/webapp/tomcat/server.xml)):
-
-{% highlight xml %}
-
-<Server port="${driver.shutdownPort?c}" shutdown="SHUTDOWN">
-     <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />
-     <Listener className="org.apache.catalina.core.JasperListener" />
-
-{% endhighlight%}
-
-Let's add an unmatched XML element, which will make this XML file invalid:
-
-{% highlight xml %}
-
-<Server port="${driver.shutdownPort?c}" shutdown="SHUTDOWN">
-     <unmatched-element> <!-- This is invalid XML as we won't add </unmatched-element> -->
-     <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />
-     <Listener className="org.apache.catalina.core.JasperListener" />
-
-{% endhighlight%}
-
-As Brooklyn doesn't know how these types of resources are used, they're not validated as they're copied to the remote machine.
-As far as Brooklyn is concerned, the file will have copied successfully.
-
-Let's deploy Tomcat again, using the same YAML as before. This time, the deployment runs for a few minutes before failing
-with `Timeout waiting for SERVICE_UP`:
-
-[![External error in the Brooklyn debug console.](images/external-error.png)](images/external-error-large.png)
-
-If we drill down into the tasks in the `Activities` tab, we can see that all of the installation and launch tasks
-completed successfully, and stdout of the `launch` script is as follows:
-
-{% highlight console %}
-
-Executed /tmp/brooklyn-20150721-153049139-fK2U-launching_TomcatServerImpl_id_.sh, result 0
-
-{% endhighlight %}
-
-The task that failed was the `post-start` task, and the stack trace from the Detailed Status section is as follows:
-
-{% highlight console %}
-
-Failed after 5m 1s: Timeout waiting for SERVICE_UP from TomcatServerImpl{id=BUHgQeOs}
-
-java.lang.IllegalStateException: Timeout waiting for SERVICE_UP from TomcatServerImpl{id=BUHgQeOs}
-    at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1073)
-    at brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:388)
-    at brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:385)
-    at brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:164)
-    at brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:433)
-    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
-    at brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343)
-    at brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
-    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
-    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
-    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
-at java.lang.Thread.run(Thread.java:745)
-
-{% endhighlight %}
-
-This doesn't really tell us what we need to know, and looking in the `brooklyn.debug.log` file yields no further
-clues. The key here is the error message `Timeout waiting for SERVICE_UP`. After running the installation and
-launch scripts, assuming all scripts completed successfully, Brooklyn will periodically check the health of the node
-and will set the node on fire if the health check does not pass within a pre-prescribed period (the default is
-two minutes, and can be configured using the `start.timeout` config key). The periodic health check also continues
-after the successful launch in order to check continued operation of the node, but in this case it fails to pass
-at all.
-
-The first thing we need to do is to find out how Brooklyn determines the health of the node. The health-check is 
-often implemented in the `isRunning()` method in the entity's ssh driver. Tomcat's implementation of `isRunning()`
-is as follows:
-
-{% highlight java %}
-@Override
-public boolean isRunning() {
-    return newScript(MutableMap.of(USE_PID_FILE, "pid.txt"), CHECK_RUNNING).execute() == 0;
-}
-{% endhighlight %}
-
-The `newScript` method has conveniences for default scripts to check if a process is running based on its PID. In this
-case, it will look for Tomcat's PID in the `pid.txt` file and check if the PID is the PID of a running process
-
-It's worth a quick sanity check at this point to check if the PID file exists, and if the process is running.
-By default, the pid file is located in the run directory of the entity. You can find the location of the entity's run
-directory by looking at the `run.dir` sensor. In this case it is `/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs`.
-To find the pid, you simply cat the pid.txt file in this directory:
-
-{% highlight console %}
-$ cat /tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs/pid.txt
-73714
-{% endhighlight %}
-
-In this case, the PID in the file is 73714. You can then check if the process is running using `ps`. You can also
-pipe the output to `fold` so the full launch command is visible:
-
-{% highlight console %}
-
-$ ps -p 73714 | fold -w 120
-PID TTY           TIME CMD
-73714 ??         0:08.03 /Library/Java/JavaVirtualMachines/jdk1.8.0_51.jdk/Contents/Home/bin/java -Dnop -Djava.util.logg
-ing.manager=org.apache.juli.ClassLoaderLogManager -javaagent:/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BU
-HgQeOs/brooklyn-jmxmp-agent-shaded-0.8.0-SNAPSHOT.jar -Xms200m -Xmx800m -XX:MaxPermSize=400m -Dcom.sun.management.jmxrem
-ote -Dbrooklyn.jmxmp.rmi-port=1099 -Dbrooklyn.jmxmp.port=31001 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.manage
-ment.jmxremote.authenticate=false -Djava.endorsed.dirs=/tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7
-.0.56/endorsed -classpath /tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7.0.56/bin/bootstrap.jar:/tmp/
-brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7.0.56/bin/tomcat-juli.jar -Dcatalina.base=/tmp/brooklyn-mart
-in/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs -Dcatalina.home=/tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache
--tomcat-7.0.56 -Djava.io.tmpdir=/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs/temp org.apache.catali
-na.startup.Bootstrap start
-
-{% endhighlight %}
-
-This confirms that the process is running. The next thing we can look at is the `service.notUp.indicators` sensor. This
-reads as follows:
-
-{% highlight json %}
-
-{"service.process.isRunning":"The software process for this entity does not appear to be running"}
-
-{% endhighlight %}
-
-This confirms that the problem is indeed due to the `service.process.isRunning` sensor. We assumed earlier that this was
-set by the `isRunning()` method in `TomcatSshDriver.java`, but this isn't always the case. The `service.process.isRunning`
-sensor is wired up by the `connectSensors()` method in the node's implementation class, in this case 
-`TomcatServerImpl.java`. Tomcat's implementation of `connectSensors()` is as follows:
-
-{% highlight java %}
-
-@Override
-public void connectSensors() {
-    super.connectSensors();
-
-    if (getDriver().isJmxEnabled()) {
-        String requestProcessorMbeanName = "Catalina:type=GlobalRequestProcessor,name=\"http-*\"";
-
-        Integer port = isHttpsEnabled() ? getAttribute(HTTPS_PORT) : getAttribute(HTTP_PORT);
-        String connectorMbeanName = format("Catalina:type=Connector,port=%s", port);
-
-        jmxWebFeed = JmxFeed.builder()
-            .entity(this)
-            .period(3000, TimeUnit.MILLISECONDS)
-            .pollAttribute(new JmxAttributePollConfig<Integer>(ERROR_COUNT)
-                    .objectName(requestProcessorMbeanName)
-                    .attributeName("errorCount"))
-            .pollAttribute(new JmxAttributePollConfig<Integer>(REQUEST_COUNT)
-                    .objectName(requestProcessorMbeanName)
-                    .attributeName("requestCount"))
-            .pollAttribute(new JmxAttributePollConfig<Integer>(TOTAL_PROCESSING_TIME)
-                    .objectName(requestProcessorMbeanName)
-                    .attributeName("processingTime"))
-            .pollAttribute(new JmxAttributePollConfig<String>(CONNECTOR_STATUS)
-                    .objectName(connectorMbeanName)
-                    .attributeName("stateName"))
-            .pollAttribute(new JmxAttributePollConfig<Boolean>(SERVICE_PROCESS_IS_RUNNING)
-                    .objectName(connectorMbeanName)
-                    .attributeName("stateName")
-                    .onSuccess(Functions.forPredicate(Predicates.<Object>equalTo("STARTED")))
-                    .setOnFailureOrException(false))
-            .build();
-
-        jmxAppFeed = JavaAppUtils.connectMXBeanSensors(this);
-    } else {
-        // if not using JMX
-        LOG.warn("Tomcat running without JMX monitoring; limited visibility of service available");
-        connectServiceUpIsRunning();
-    }
-}
-
-{% endhighlight %}
-
-We can see here that if jmx is not enabled, the method will call `connectServiceUpIsRunning()` which will use the
-default PID-based method of determining if a process is running. However, as JMX *is* running, the `service.process.isRunning`
-sensor (denoted here by the `SERVICE_PROCESS_IS_RUNNING` variable) is set to true if and only if the
-`stateName` JMX attribute equals `STARTED`. We can see from the previous call to `.pollAttribute` that this
-attribute is also published to the `CONNECTOR_STATUS` sensor. The `CONNECTOR_STATUS` sensor is defined as follows:
-
-{% highlight java %}
-
-AttributeSensor<String> CONNECTOR_STATUS =
-    new BasicAttributeSensor<String>(String.class, "webapp.tomcat.connectorStatus", "Catalina connector state name");
-
-{% endhighlight %}
-
-Let's go back to the Brooklyn debug console and look for the `webapp.tomcat.connectorStatus`:
-
-[![Sensors view in the Brooklyn debug console.](images/jmx-sensors.png)](images/jmx-sensors-large.png)
-
-As the sensor is not shown, it's likely that it's simply null or not set. We can check this by clicking
-the `Show/hide empty records` icon (highlighted in yellow above):
-
-[![All sensors view in the Brooklyn debug console.](images/jmx-sensors-all.png)](images/jmx-sensors-all-large.png)
-
-We know from previous steps that the installation and launch scripts completed, and we know the procecess is running,
-but we can see here that the server is not responding to JMX requests. A good thing to check here would be that the
-JMX port is not being blocked by iptables, firewalls or security groups
- (see the (troubleshooting connectivity guide)[troubleshooting-connectivity.html]). 
-Let's assume that we've checked that and they're all open. There is still one more thing that Brooklyn can tell us.
-
-Still on the `Sensors` tab, let's take a look at the `log.location` sensor:
-
-{% highlight console %}
-
-/tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/logs/catalina.out
-
-{% endhighlight %}
-
-This is the location of Tomcat's own log file. The location of the log file will differ from process to process
-and when writing a custom entity you will need to check the software's own documentation. If your blueprint's
-ssh driver extends `JavaSoftwareProcessSshDriver`, the value returned by the `getLogFileLocation()` method will
-automatically be published to the `log.location` sensor. Otherwise, you can publish the value yourself by calling
-`entity.setAttribute(Attributes.LOG_FILE_LOCATION, getLogFileLocation());` in your ssh driver
-
-**Note:** The log file will be on the server to which you have deployed Tomcat, and not on the Brooklyn server.
-Let's take a look in the log file:
-
-{% highlight console %}
-
-less /tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/logs/catalina.out
-
-Jul 21, 2015 4:12:12 PM org.apache.tomcat.util.digester.Digester fatalError
-SEVERE: Parse Fatal Error at line 143 column 3: The element type "unmatched-element" must be terminated by the matching end-tag "</unmatched-element>".
-    org.xml.sax.SAXParseException; systemId: file:/tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/conf/server.xml; lineNumber: 143; columnNumber: 3; The element type "unmatched-element" must be terminated by the matching end-tag "</unmatched-element>".
-    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
-    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
-    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
-    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
-    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1437)
-    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1749)
-    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2973)
-    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
-    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
-    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
-    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
-    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
-    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
-    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649)
-    at org.apache.tomcat.util.digester.Digester.parse(Digester.java:1561)
-    at org.apache.catalina.startup.Catalina.load(Catalina.java:615)
-    at org.apache.catalina.startup.Catalina.start(Catalina.java:677)
-    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
-    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-    at java.lang.reflect.Method.invoke(Method.java:497)
-    at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:321)
-at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:455)
-
-    Jul 21, 2015 4:12:12 PM org.apache.catalina.startup.Catalina load
-    WARNING: Catalina.start using conf/server.xml: The element type "unmatched-element" must be terminated by the matching end-tag "</unmatched-element>".
-    Jul 21, 2015 4:12:12 PM org.apache.catalina.startup.Catalina start
-    SEVERE: Cannot start server. Server instance is not configured.
-
-{% endhighlight %}
-
-As expected, we can see here that the `unmatched-element` element has not been terminated in the `server.xml` file

http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/connectivity.md
----------------------------------------------------------------------
diff --git a/docs/guide/ops/troubleshooting/connectivity.md b/docs/guide/ops/troubleshooting/connectivity.md
new file mode 100644
index 0000000..07874c0
--- /dev/null
+++ b/docs/guide/ops/troubleshooting/connectivity.md
@@ -0,0 +1,143 @@
+---
+layout: website-normal
+title: Troubleshooting Server Connectivity Issues in the Cloud
+toc: /guide/toc.json
+---
+
+A common problem when setting up an application in the cloud is getting the basic connectivity right - how
+do I get my service (e.g. a TCP host:port) publicly accessible over the internet?
+
+This varies a lot - e.g. Is the VM public or in a private network? Is the service only accessible through
+a load balancer? Should the service be globally reachable or only to a particular CIDR?
+
+This guide gives some general tips for debugging connectivity issues, which are applicable to a 
+range of different service types. Choose those that are appropriate for your use-case.
+
+## VM reachable
+If the VM is supposed to be accessible directly (e.g. from the public internet, or if in a private network
+then from a jump host)...
+
+### ping
+Can you `ping` the VM from the machine you are trying to reach it from?
+
+However, ping is over ICMP. If the VM is unreachable, it could be that the firewall forbids ICMP but still
+lets TCP traffic through.
+
+### telnet to TCP port
+You can check if a given TCP port is reachable and listening using `telnet <host> <port>`, such as
+`telnet www.google.com 80`, which gives output like:
+
+```
+    Trying 31.55.163.219...
+    Connected to www.google.com.
+    Escape character is '^]'.
+```
+
+If this is very slow to respond, it can be caused by a firewall blocking access. If it is fast, it could
+be that the server is just not listening on that port.
+
+### DNS and routing
+If using a hostname rather than IP, then is it resolving to a sensible IP?
+
+Is the route to the server sensible? (e.g. one can hit problems with proxy servers in a corporate
+network, or ISPs returning a default result for unknown hosts).
+
+The following commands can be useful:
+
+* `host` is a DNS lookup utility. e.g. `host www.google.com`.
+* `dig` stands for "domain information groper". e.g. `dig www.google.com`.
+* `traceroute` prints the route that packets take to a network host. e.g. `traceroute www.google.com`.
+
+## Service is listening
+
+### Service responds
+Try connecting to the service from the VM itself. For example, `curl http://localhost:8080` for a
+web-service.
+
+On dev/test VMs, don't be afraid to install the utilities you need such as `curl`, `telnet`, `nc`,
+etc. Cloud VMs often have a very cut-down set of packages installed. For example, execute
+`sudo apt-get update; sudo apt-get install -y curl` or `sudo yum install -y curl`.
+
+### Listening on port
+Check that the service is listening on the port, and on the correct NIC(s).
+
+Execute `netstat -antp` (or on OS X `netstat -antp TCP`) to list the TCP ports in use (or use
+`-anup` for UDP). You should expect to see the something like the output below for a service.
+
+```
+Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
+tcp        0      0 :::8080                     :::*                        LISTEN      8276/java           
+```
+
+In this case a Java process with pid 8276 is listening on port 8080. The local address `:::8080`
+format means all NICs (in IPv6 address format). You may also see `0.0.0.0:8080` for IPv4 format.
+If it says 127.0.0.1:8080 then your service will most likely not be reachable externally.
+
+Use `ip addr show` (or the obsolete `ifconfig -a`) to see the network interfaces on your server.
+
+For `netstat`, run with `sudo` to see the pid for all listed ports.
+
+## Firewalls
+On Linux, check if `iptables` is preventing the remote connection. On Windows, check the Windows Firewall.
+
+If it is acceptable (e.g. it is not a server in production), try turning off the firewall temporarily,
+and testing connectivity again. Remember to re-enable it afterwards! On CentOS, this is `sudo service
+iptables stop`. On Ubuntu, use `sudo ufw disable`. On Windows, press the Windows key and type 'Windows
+Firewall with Advanced Security' to open the firewall tools, then click 'Windows Firewall Properties'
+and set the firewall state to 'Off' in the Domain, Public and Private profiles.
+
+If you cannot temporarily turn off the firewall, then look carefully at the firewall settings. For
+example, execute `sudo iptables -n --list` and `iptables -t nat -n --list`.
+
+## Cloud firewalls
+Some clouds offer a firewall service, where ports need to be explicitly listed to be reachable.
+
+For example, [security groups for EC2-classic]
+(http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#ec2-classic-security-groups)
+have rules for the protocols and ports to be reachable from specific CIDRs.
+
+Check these settings via the cloud provider's web-console (or API).
+
+## Quick test of a listener port
+It can be useful to start listening on a given port, and to then check if that port is reachable.
+This is useful for testing basic connectivity when your service is not yet running, or to a
+different port to compare behaviour, or to compare with another VM in the network.
+
+The `nc` netcat tool is useful for this. For example, `nc -l 0.0.0.0 8080` will listen on port
+TCP 8080 on all network interfaces. On another server, you can then run `echo hello from client
+| nc <hostname> 8080`. If all works well, this will send "hello from client" over the TCP port 8080,
+which will be written out by the `nc -l` process before exiting.
+
+Similarly for UDP, you use `-lU`.
+
+You may first have to install `nc`, e.g. with `sudo yum install -y nc` or `sudo apt-get install netcat`.
+
+### Cloud load balancers
+For some use-cases, it is good practice to use the load balancer service offered by the cloud provider
+(e.g. [ELB in AWS](http://aws.amazon.com/elasticloadbalancing/) or the [Cloudstack Load Balancer]
+(http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/latest/network_setup.html#management-server-load-balancing))
+
+The VMs can all be isolated within a private network, with access only through the load balancer service.
+
+Debugging techniques here include ensuring connectivity from another jump server within the private
+network, and careful checking of the load-balancer configuration from the Cloud Provider's web-console.
+
+### DNAT
+Use of DNAT is appropriate for some use-cases, where a particular port on a particular VM is to be
+made available.
+
+Debugging connectivity issues here is similar to the steps for a cloud load balancer. Ensure
+connectivity from another jump server within the private network. Carefully check the NAT rules from
+the Cloud Provider's web-console.
+
+### Guest wifi
+It is common for guest wifi to restrict access to only specific ports (e.g. 80 and 443, restricting
+ssh over port 22 etc).
+
+Normally your best bet is then to abandon the guest wifi (e.g. to tether to a mobile phone instead).
+
+There are some unconventional workarounds such as [configuring sshd to listen on port 80 so you can
+use an ssh tunnel](http://askubuntu.com/questions/107173/is-it-possible-to-ssh-through-port-80).
+However, the firewall may well inspect traffic so sending non-http traffic over port 80 may still fail.
+
+  

http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/deployment.md
----------------------------------------------------------------------
diff --git a/docs/guide/ops/troubleshooting/deployment.md b/docs/guide/ops/troubleshooting/deployment.md
new file mode 100644
index 0000000..c343762
--- /dev/null
+++ b/docs/guide/ops/troubleshooting/deployment.md
@@ -0,0 +1,88 @@
+---
+layout: website-normal
+title: Troubleshooting Deployment
+toc: /guide/toc.json
+---
+
+This guide describes common problems encountered when deploying applications.
+
+
+## YAML deployment errors
+
+The error `Invalid YAML: Plan not in acceptable format: Cannot convert ...` means that the text is not 
+valid YAML. Common reasons include that the indentation is incorrect, or that there are non-matching
+brackets.
+
+The error `Unrecognized application blueprint format: no services defined` means that the `services:`
+section is missing.
+
+An error like `Deployment plan item io.brooklyn.camp.spi.pdp.Service@23c159e2[name=<null>,description=<null>,serviceType=com.acme.Foo,characteristics=[],customAttributes={}] cannot be matched` means that the given entity type (in this case com.acme.Foo) is not in the catalog or on the classpath.
+
+An error like `Illegal parameter for 'location' (aws-ec3); not resolvable: java.util.NoSuchElementException: Unknown location 'aws-ec3': either this location is not recognised or there is a problem with location resolver configuration` means that the given location (in this case aws-ec3) 
+was unknown. This means it does not match any of the named locations in brooklyn.properties, nor any of the
+clouds enabled in the jclouds support, nor any of the locations added dynamically through the catalog API.
+
+
+## VM Provisioning Failures
+
+There are many stages at which VM provisioning can fail! An error `Failure running task provisioning` 
+means there was some problem obtaining or connecting to the machine.
+
+An error like `... Not authorized to access cloud ...` usually means the wrong identity/credential was used.
+
+An error like `Unable to match required VM template constraints` means that a matching image (e.g. AMI in AWS terminology) could not be found. This 
+could be because an incorrect explicit image id was supplied, or because the match-criteria could not
+be satisfied using the given images available in the given cloud. The first time this error is 
+encountered, a listing of all images in that cloud/region will be written to the debug log.
+
+Failure to form an ssh connection to the newly provisioned VM can be reported in several different ways, 
+depending on the nature of the error. This breaks down into failures at different points:
+
+* Failure to reach the ssh port (e.g. `... could not connect to any ip address port 22 on node ...`).
+* Failure to do the very initial ssh login (e.g. `... Exhausted available authentication methods ...`).
+* Failure to ssh using the newly created user.
+
+There are many possible reasons for this ssh failure, which include:
+
+* The VM was "dead on arrival" (DOA) - sometimes a cloud will return an unusable VM. One can work around
+  this using the `machineCreateAttempts` configuration option, to automatically retry with a new VM.
+* Local network restrictions. On some guest wifis, external access to port 22 is forbidden.
+  Check by manually trying to reach port 22 on a different machine that you have access it.
+* NAT rules not set up correctly. On some clouds that have only private IPs, Brooklyn can automatically
+  create NAT rules to provide access to port 22. If this NAT rule creation fails for some reason,
+  then Brooklyn will not be able to reach the VM. If NAT rules are being created for your cloud, then
+  check the logs for warnings or errors about the NAT rule creation.
+* ssh credentials incorrectly configured. The Brooklyn configuration is very flexible in how ssh
+  credentials can be configured. However, if a more advanced configuration is used incorrectly (e.g. 
+  the wrong login user, or invalid ssh keys) then this will fail.
+* Wrong login user. The initial login user to use when first logging into the new VM is inferred from 
+  the metadata provided by the cloud provider about that image. This can sometimes be incomplete, so
+  the wrong user may be used. This can be explicitly set using the `loginUser` configuration option.
+  An example of this is with some Ubuntu VMs, where the "ubuntu" user should be used. However, on some clouds
+  it defaults to trying to ssh as "root".
+* Bad choice of user. By default, Brooklyn will create a user with the same name as the user running the
+  Brooklyn process; the choice of user name is configurable. If this user already exists on the machine, 
+  then the user setup will not behave as expected. Subsequent attempts to ssh using this user could then fail.
+* Custom credentials on the VM. Most clouds will automatically set the ssh login details (e.g. in AWS using  
+  the key-pair, or in CloudStack by auto-generating a password). However, with some custom images the VM
+  will have hard-coded credentials that must be used. If Brooklyn's configuration does not match that,
+  then it will fail.
+* Guest customisation by the cloud. On some clouds (e.g. vCloud Air), the VM can be configured to do
+  guest customisation immediately after the VM starts. This can include changing the root password.
+  If Brooklyn is not configured with the expected changed password, then the VM provisioning may fail
+  (depending if Brooklyn connects before or after the password is changed!).
+ 
+A very useful debug configuration is to set `destroyOnFailure` to false. This will allow ssh failures to
+be more easily investigated.
+
+
+## Timeout Waiting For Service-Up
+
+A common generic error message is that there was a timeout waiting for service-up.
+
+This just means that the entity did not get to service-up in the pre-defined time period (the default is 
+two minutes, and can be configured using the `start.timeout` config key; the timer begins after the 
+start tasks are completed).
+
+See the guide on [runtime errors](troubleshooting-runtime-errors.html) for where to find additional information, especially the section on
+"Entity's Error Status".

http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/going-deep-in-java-and-logs.md
----------------------------------------------------------------------
diff --git a/docs/guide/ops/troubleshooting/going-deep-in-java-and-logs.md b/docs/guide/ops/troubleshooting/going-deep-in-java-and-logs.md
new file mode 100644
index 0000000..613a15a
--- /dev/null
+++ b/docs/guide/ops/troubleshooting/going-deep-in-java-and-logs.md
@@ -0,0 +1,488 @@
+---
+layout: website-normal
+title: "Troubleshooting: Going Deep in Java and Logs"
+toc: /guide/toc.json
+---
+
+This guide takes a deep look at the Java and log messages for some failure scenarios,
+giving common steps used to identify the issues.
+
+## Script Failure
+
+Many blueprints run bash scripts as part of the installation. This section highlights how to identify a problem with
+a bash script.
+
+First let's take a look at the `customize()` method of the Tomcat server blueprint:
+
+{% highlight java %}
+  @Override
+  public void customize() {
+      newScript(CUSTOMIZING)
+          .body.append("mkdir -p conf logs webapps temp")
+          .failOnNonZeroResultCode()
+          .execute();
+
+      copyTemplate(entity.getConfig(TomcatServer.SERVER_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "server.xml"));
+      copyTemplate(entity.getConfig(TomcatServer.WEB_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "web.xml"));
+
+      if (isProtocolEnabled("HTTPS")) {
+          String keystoreUrl = Preconditions.checkNotNull(getSslKeystoreUrl(), "keystore URL must be specified if using HTTPS for " + entity);
+          String destinationSslKeystoreFile = getHttpsSslKeystoreFile();
+          InputStream keystoreStream = resource.getResourceFromUrl(keystoreUrl);
+          getMachine().copyTo(keystoreStream, destinationSslKeystoreFile);
+      }
+
+      getEntity().deployInitialWars();
+  }
+{% endhighlight %}
+
+Here we can see that it's running a script to create four directories before continuing with the customization. Let's
+introduce an error by changing `mkdir` to `mkrid`:
+
+{% highlight java %}
+      newScript(CUSTOMIZING)
+          .body.append("mkrid -p conf logs webapps temp") // `mkdir` changed to `mkrid`
+          .failOnNonZeroResultCode()
+          .execute();
+{% endhighlight %}
+
+Now let's try deploying this using the following YAML:
+
+{% highlight yaml %}
+
+name: Tomcat failure test
+location: localhost
+services:
+- type: brooklyn.entity.webapp.tomcat.TomcatServer
+
+{% endhighlight %}
+
+Shortly after deployment, the entity fails with the following error:
+
+`Failure running task ssh: customizing TomcatServerImpl{id=e1HP2s8x} (HmyPAozV): 
+Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}`
+
+[![Script failure error in the Brooklyn debug console.](images/script-failure.png)](images/script-failure-large.png)
+
+By selecting the `Activities` tab, we can drill into the task that failed. The list of tasks shown (where the 
+effectors are shown as top-level tasks) are clickable links. Selecting that row will show the details of
+that particular task, including its sub-tasks. We can eventually get to the specific sub-task that failed:
+
+[![Task failure error in the Brooklyn debug console.](images/failed-task.png)](images/failed-task-large.png)
+
+By clicking on the `stderr` link, we can see the script failed with the following error:
+
+{% highlight console %}
+/tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh: line 10: mkrid: command not found
+{% endhighlight %}
+
+This tells us *what* went wrong, but doesn't tell us *where*. In order to find that, we'll need to look at the
+stack trace that was logged when the exception was thrown.
+
+It's always worth looking at the Detailed Status section as sometimes this will give you the information you need.
+In this case, the stack trace is limited to the thread that was used to execute the task that ran the script:
+
+{% highlight console %}
+Failed after 40ms
+
+STDERR
+/tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh: line 10: mkrid: command not found
+
+
+STDOUT
+Executed /tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh, result 127: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
+
+java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
+    at brooklyn.entity.basic.lifecycle.ScriptHelper.logWithDetailsAndThrow(ScriptHelper.java:390)
+    at brooklyn.entity.basic.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:379)
+    at brooklyn.entity.basic.lifecycle.ScriptHelper$8.call(ScriptHelper.java:289)
+    at brooklyn.entity.basic.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287)
+    at brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343)
+    at brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
+    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
+    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
+    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
+at java.lang.Thread.run(Thread.java:745)
+{% endhighlight %}
+
+In order to find the exception, we'll need to look in Brooklyn's debug log file. By default, the debug log file
+is named `brooklyn.debug.log`. Usually the easiest way to navigate the log file is to use `less`, e.g.
+`less brooklyn.debug.log`. We can quickly find find the stack trace by first navigating to the end of the log file
+with `Shift-G`, then performing a reverse-lookup by typing `?Tomcat` and pressing `Enter`. If searching for the 
+blueprint type (in this case Tomcat) simply matches tasks unrelated to the exception, you can also search for 
+the text of the error message, in this case `? invalid result 127`. You can make the search case-insensitivity by
+typing `-i` before performing the search. To skip the current match and move to the next on (i.e. 'up' as we're
+performing a reverse-lookup), simply press `n`
+
+In this case, the `?Tomcat` search takes us directly to the full stack trace (Only the last part of the trace
+is shown here):
+
+{% highlight console %}
+
+at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63) ~[guava-17.0.jar:na]
+at brooklyn.util.task.BasicTask.get(BasicTask.java:343) ~[classes/:na]
+at brooklyn.util.task.BasicTask.getUnchecked(BasicTask.java:352) ~[classes/:na]
+... 9 common frames omitted
+Caused by: brooklyn.util.exceptions.PropagatedRuntimeException: 
+at brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:97) ~[classes/:na]
+at brooklyn.util.task.BasicTask.getUnchecked(BasicTask.java:354) ~[classes/:na]
+at brooklyn.entity.basic.lifecycle.ScriptHelper.execute(ScriptHelper.java:339) ~[classes/:na]
+at brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:72) ~[classes/:na]
+at brooklyn.entity.basic.AbstractSoftwareProcessDriver$8.run(AbstractSoftwareProcessDriver.java:150) ~[classes/:na]
+at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_71]
+at brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343) ~[classes/:na]
+... 5 common frames omitted
+Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
+at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_71]
+at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_71]
+at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63) ~[guava-17.0.jar:na]
+at brooklyn.util.task.BasicTask.get(BasicTask.java:343) ~[classes/:na]
+at brooklyn.util.task.BasicTask.getUnchecked(BasicTask.java:352) ~[classes/:na]
+... 10 common frames omitted
+Caused by: java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
+at brooklyn.entity.basic.lifecycle.ScriptHelper.logWithDetailsAndThrow(ScriptHelper.java:390) ~[classes/:na]
+at brooklyn.entity.basic.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:379) ~[classes/:na]
+at brooklyn.entity.basic.lifecycle.ScriptHelper$8.call(ScriptHelper.java:289) ~[classes/:na]
+at brooklyn.entity.basic.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287) ~[classes/:na]
+... 6 common frames omitted
+
+{% endhighlight %}
+
+Brooklyn's use of tasks and helper classes can make the stack trace a little harder than usual to follow, but a good
+place to start is to look through the stack trace for the node's implementation or ssh driver classes (usually
+named `FooNodeImpl` or `FooSshDriver`). In this case we can see the following:
+
+{% highlight console %}
+at brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:72) ~[classes/:na]
+{% endhighlight %}
+
+Combining this with the error message of `mkrid: command not found` we can see that indeed `mkdir` has been
+misspelled `mkrid` on line 72 of `TomcatSshDriver.java`.
+
+
+## Non-Script Failure
+
+The section above gives an example of a failure that occurs when a script is run. In this section we will look at
+a failure in a non-script related part of the code. We'll use the `customize()` method of the Tomcat server again,
+but this time, we'll correct the spelling of 'mkdir' and add a line that attempts to copy a nonexistent resource 
+to the remote server:
+
+{% highlight java %}
+
+newScript(CUSTOMIZING)
+    .body.append("mkdir -p conf logs webapps temp")
+    .failOnNonZeroResultCode()
+    .execute();
+
+copyTemplate(entity.getConfig(TomcatServer.SERVER_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "server.xml"));
+copyTemplate(entity.getConfig(TomcatServer.WEB_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "web.xml"));
+copyTemplate("classpath://nonexistent.xml", Os.mergePaths(getRunDir(), "conf", "nonexistent.xml")); // Resource does not exist!
+
+{% endhighlight %}
+
+Let's deploy this using the same YAML from above. Here's the resulting error in the Brooklyn debug console:
+
+[![Resource exception in the Brooklyn debug console.](images/resource-exception.png)](images/resource-exception-large.png)
+
+Again, this tells us *what* the error is, but we need to find *where* the code is that attempts to copy this file. In
+this case it's shown in the Detailed Status section, and we don't need to go to the log file:
+
+{% highlight console %}
+
+Failed after 221ms: Error getting resource 'classpath://nonexistent.xml' for TomcatServerImpl{id=PVZxDKU1}: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath
+
+java.lang.RuntimeException: Error getting resource 'classpath://nonexistent.xml' for TomcatServerImpl{id=PVZxDKU1}: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath
+    at brooklyn.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:297)
+    at brooklyn.util.ResourceUtils.getResourceAsString(ResourceUtils.java:475)
+    at brooklyn.entity.basic.AbstractSoftwareProcessDriver.getResourceAsString(AbstractSoftwareProcessDriver.java:447)
+    at brooklyn.entity.basic.AbstractSoftwareProcessDriver.processTemplate(AbstractSoftwareProcessDriver.java:469)
+    at brooklyn.entity.basic.AbstractSoftwareProcessDriver.copyTemplate(AbstractSoftwareProcessDriver.java:390)
+    at brooklyn.entity.basic.AbstractSoftwareProcessDriver.copyTemplate(AbstractSoftwareProcessDriver.java:379)
+    at brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:79)
+    at brooklyn.entity.basic.AbstractSoftwareProcessDriver$8.run(AbstractSoftwareProcessDriver.java:150)
+    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
+    at brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343)
+    at brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
+    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
+    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
+    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
+at java.lang.Thread.run(Thread.java:745)
+    Caused by: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath
+at brooklyn.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:233)
+    ... 14 more
+    Caused by: java.io.IOException: nonexistent.xml not found on classpath
+    at brooklyn.util.ResourceUtils.getResourceViaClasspath(ResourceUtils.java:372)
+at brooklyn.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:230)
+    ... 14 more
+
+{% endhighlight %}
+
+Looking for `Tomcat` in the stack trace, we can see in this case the problem lies at line 79 of `TomcatSshDriver.java`
+
+
+## External Failure
+
+Sometimes an entity will fail outside the direct commands issues by Brooklyn. When installing and launching an entity,
+Brooklyn will check the return code of scripts that were run to ensure that they completed successfully (i.e. the
+return code of the script is zero). It is possible, for example, that a launch script completes successfully, but
+the entity fails to start.
+
+We can simulate this type of failure by launching Tomcat with an invalid configuration file. As seen in the previous
+examples, Brooklyn copies two xml configuration files to the server: `server.xml` and `web.xml`
+
+The first few non-comment lines of `server.xml` are as follows (you can see the full file [here]
+(https://github.com/apache/incubator-brooklyn/blob/master/software/webapp/src/main/resources/brooklyn/entity/webapp/tomcat/server.xml)):
+
+{% highlight xml %}
+
+<Server port="${driver.shutdownPort?c}" shutdown="SHUTDOWN">
+     <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />
+     <Listener className="org.apache.catalina.core.JasperListener" />
+
+{% endhighlight%}
+
+Let's add an unmatched XML element, which will make this XML file invalid:
+
+{% highlight xml %}
+
+<Server port="${driver.shutdownPort?c}" shutdown="SHUTDOWN">
+     <unmatched-element> <!-- This is invalid XML as we won't add </unmatched-element> -->
+     <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />
+     <Listener className="org.apache.catalina.core.JasperListener" />
+
+{% endhighlight%}
+
+As Brooklyn doesn't know how these types of resources are used, they're not validated as they're copied to the remote machine.
+As far as Brooklyn is concerned, the file will have copied successfully.
+
+Let's deploy Tomcat again, using the same YAML as before. This time, the deployment runs for a few minutes before failing
+with `Timeout waiting for SERVICE_UP`:
+
+[![External error in the Brooklyn debug console.](images/external-error.png)](images/external-error-large.png)
+
+If we drill down into the tasks in the `Activities` tab, we can see that all of the installation and launch tasks
+completed successfully, and stdout of the `launch` script is as follows:
+
+{% highlight console %}
+
+Executed /tmp/brooklyn-20150721-153049139-fK2U-launching_TomcatServerImpl_id_.sh, result 0
+
+{% endhighlight %}
+
+The task that failed was the `post-start` task, and the stack trace from the Detailed Status section is as follows:
+
+{% highlight console %}
+
+Failed after 5m 1s: Timeout waiting for SERVICE_UP from TomcatServerImpl{id=BUHgQeOs}
+
+java.lang.IllegalStateException: Timeout waiting for SERVICE_UP from TomcatServerImpl{id=BUHgQeOs}
+    at brooklyn.entity.basic.Entities.waitForServiceUp(Entities.java:1073)
+    at brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:388)
+    at brooklyn.entity.basic.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:385)
+    at brooklyn.entity.basic.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:164)
+    at brooklyn.entity.software.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:433)
+    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
+    at brooklyn.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343)
+    at brooklyn.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
+    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
+    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
+    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
+at java.lang.Thread.run(Thread.java:745)
+
+{% endhighlight %}
+
+This doesn't really tell us what we need to know, and looking in the `brooklyn.debug.log` file yields no further
+clues. The key here is the error message `Timeout waiting for SERVICE_UP`. After running the installation and
+launch scripts, assuming all scripts completed successfully, Brooklyn will periodically check the health of the node
+and will set the node on fire if the health check does not pass within a pre-prescribed period (the default is
+two minutes, and can be configured using the `start.timeout` config key). The periodic health check also continues
+after the successful launch in order to check continued operation of the node, but in this case it fails to pass
+at all.
+
+The first thing we need to do is to find out how Brooklyn determines the health of the node. The health-check is 
+often implemented in the `isRunning()` method in the entity's ssh driver. Tomcat's implementation of `isRunning()`
+is as follows:
+
+{% highlight java %}
+@Override
+public boolean isRunning() {
+    return newScript(MutableMap.of(USE_PID_FILE, "pid.txt"), CHECK_RUNNING).execute() == 0;
+}
+{% endhighlight %}
+
+The `newScript` method has conveniences for default scripts to check if a process is running based on its PID. In this
+case, it will look for Tomcat's PID in the `pid.txt` file and check if the PID is the PID of a running process
+
+It's worth a quick sanity check at this point to check if the PID file exists, and if the process is running.
+By default, the pid file is located in the run directory of the entity. You can find the location of the entity's run
+directory by looking at the `run.dir` sensor. In this case it is `/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs`.
+To find the pid, you simply cat the pid.txt file in this directory:
+
+{% highlight console %}
+$ cat /tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs/pid.txt
+73714
+{% endhighlight %}
+
+In this case, the PID in the file is 73714. You can then check if the process is running using `ps`. You can also
+pipe the output to `fold` so the full launch command is visible:
+
+{% highlight console %}
+
+$ ps -p 73714 | fold -w 120
+PID TTY           TIME CMD
+73714 ??         0:08.03 /Library/Java/JavaVirtualMachines/jdk1.8.0_51.jdk/Contents/Home/bin/java -Dnop -Djava.util.logg
+ing.manager=org.apache.juli.ClassLoaderLogManager -javaagent:/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BU
+HgQeOs/brooklyn-jmxmp-agent-shaded-0.8.0-SNAPSHOT.jar -Xms200m -Xmx800m -XX:MaxPermSize=400m -Dcom.sun.management.jmxrem
+ote -Dbrooklyn.jmxmp.rmi-port=1099 -Dbrooklyn.jmxmp.port=31001 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.manage
+ment.jmxremote.authenticate=false -Djava.endorsed.dirs=/tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7
+.0.56/endorsed -classpath /tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7.0.56/bin/bootstrap.jar:/tmp/
+brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7.0.56/bin/tomcat-juli.jar -Dcatalina.base=/tmp/brooklyn-mart
+in/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs -Dcatalina.home=/tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache
+-tomcat-7.0.56 -Djava.io.tmpdir=/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs/temp org.apache.catali
+na.startup.Bootstrap start
+
+{% endhighlight %}
+
+This confirms that the process is running. The next thing we can look at is the `service.notUp.indicators` sensor. This
+reads as follows:
+
+{% highlight json %}
+
+{"service.process.isRunning":"The software process for this entity does not appear to be running"}
+
+{% endhighlight %}
+
+This confirms that the problem is indeed due to the `service.process.isRunning` sensor. We assumed earlier that this was
+set by the `isRunning()` method in `TomcatSshDriver.java`, but this isn't always the case. The `service.process.isRunning`
+sensor is wired up by the `connectSensors()` method in the node's implementation class, in this case 
+`TomcatServerImpl.java`. Tomcat's implementation of `connectSensors()` is as follows:
+
+{% highlight java %}
+
+@Override
+public void connectSensors() {
+    super.connectSensors();
+
+    if (getDriver().isJmxEnabled()) {
+        String requestProcessorMbeanName = "Catalina:type=GlobalRequestProcessor,name=\"http-*\"";
+
+        Integer port = isHttpsEnabled() ? getAttribute(HTTPS_PORT) : getAttribute(HTTP_PORT);
+        String connectorMbeanName = format("Catalina:type=Connector,port=%s", port);
+
+        jmxWebFeed = JmxFeed.builder()
+            .entity(this)
+            .period(3000, TimeUnit.MILLISECONDS)
+            .pollAttribute(new JmxAttributePollConfig<Integer>(ERROR_COUNT)
+                    .objectName(requestProcessorMbeanName)
+                    .attributeName("errorCount"))
+            .pollAttribute(new JmxAttributePollConfig<Integer>(REQUEST_COUNT)
+                    .objectName(requestProcessorMbeanName)
+                    .attributeName("requestCount"))
+            .pollAttribute(new JmxAttributePollConfig<Integer>(TOTAL_PROCESSING_TIME)
+                    .objectName(requestProcessorMbeanName)
+                    .attributeName("processingTime"))
+            .pollAttribute(new JmxAttributePollConfig<String>(CONNECTOR_STATUS)
+                    .objectName(connectorMbeanName)
+                    .attributeName("stateName"))
+            .pollAttribute(new JmxAttributePollConfig<Boolean>(SERVICE_PROCESS_IS_RUNNING)
+                    .objectName(connectorMbeanName)
+                    .attributeName("stateName")
+                    .onSuccess(Functions.forPredicate(Predicates.<Object>equalTo("STARTED")))
+                    .setOnFailureOrException(false))
+            .build();
+
+        jmxAppFeed = JavaAppUtils.connectMXBeanSensors(this);
+    } else {
+        // if not using JMX
+        LOG.warn("Tomcat running without JMX monitoring; limited visibility of service available");
+        connectServiceUpIsRunning();
+    }
+}
+
+{% endhighlight %}
+
+We can see here that if jmx is not enabled, the method will call `connectServiceUpIsRunning()` which will use the
+default PID-based method of determining if a process is running. However, as JMX *is* running, the `service.process.isRunning`
+sensor (denoted here by the `SERVICE_PROCESS_IS_RUNNING` variable) is set to true if and only if the
+`stateName` JMX attribute equals `STARTED`. We can see from the previous call to `.pollAttribute` that this
+attribute is also published to the `CONNECTOR_STATUS` sensor. The `CONNECTOR_STATUS` sensor is defined as follows:
+
+{% highlight java %}
+
+AttributeSensor<String> CONNECTOR_STATUS =
+    new BasicAttributeSensor<String>(String.class, "webapp.tomcat.connectorStatus", "Catalina connector state name");
+
+{% endhighlight %}
+
+Let's go back to the Brooklyn debug console and look for the `webapp.tomcat.connectorStatus`:
+
+[![Sensors view in the Brooklyn debug console.](images/jmx-sensors.png)](images/jmx-sensors-large.png)
+
+As the sensor is not shown, it's likely that it's simply null or not set. We can check this by clicking
+the `Show/hide empty records` icon (highlighted in yellow above):
+
+[![All sensors view in the Brooklyn debug console.](images/jmx-sensors-all.png)](images/jmx-sensors-all-large.png)
+
+We know from previous steps that the installation and launch scripts completed, and we know the procecess is running,
+but we can see here that the server is not responding to JMX requests. A good thing to check here would be that the
+JMX port is not being blocked by iptables, firewalls or security groups
+(see the (troubleshooting connectivity guide)[troubleshooting-connectivity.html]). 
+Let's assume that we've checked that and they're all open. There is still one more thing that Brooklyn can tell us.
+
+
+Still on the `Sensors` tab, let's take a look at the `log.location` sensor:
+
+{% highlight console %}
+
+/tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/logs/catalina.out
+
+{% endhighlight %}
+
+This is the location of Tomcat's own log file. The location of the log file will differ from process to process
+and when writing a custom entity you will need to check the software's own documentation. If your blueprint's
+ssh driver extends `JavaSoftwareProcessSshDriver`, the value returned by the `getLogFileLocation()` method will
+automatically be published to the `log.location` sensor. Otherwise, you can publish the value yourself by calling
+`entity.setAttribute(Attributes.LOG_FILE_LOCATION, getLogFileLocation());` in your ssh driver
+
+**Note:** The log file will be on the server to which you have deployed Tomcat, and not on the Brooklyn server.
+Let's take a look in the log file:
+
+{% highlight console %}
+
+less /tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/logs/catalina.out
+
+Jul 21, 2015 4:12:12 PM org.apache.tomcat.util.digester.Digester fatalError
+SEVERE: Parse Fatal Error at line 143 column 3: The element type "unmatched-element" must be terminated by the matching end-tag "</unmatched-element>".
+    org.xml.sax.SAXParseException; systemId: file:/tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/conf/server.xml; lineNumber: 143; columnNumber: 3; The element type "unmatched-element" must be terminated by the matching end-tag "</unmatched-element>".
+    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
+    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
+    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
+    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
+    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1437)
+    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1749)
+    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2973)
+    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
+    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
+    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
+    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
+    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
+    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
+    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649)
+    at org.apache.tomcat.util.digester.Digester.parse(Digester.java:1561)
+    at org.apache.catalina.startup.Catalina.load(Catalina.java:615)
+    at org.apache.catalina.startup.Catalina.start(Catalina.java:677)
+    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+    at java.lang.reflect.Method.invoke(Method.java:497)
+    at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:321)
+at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:455)
+
+    Jul 21, 2015 4:12:12 PM org.apache.catalina.startup.Catalina load
+    WARNING: Catalina.start using conf/server.xml: The element type "unmatched-element" must be terminated by the matching end-tag "</unmatched-element>".
+    Jul 21, 2015 4:12:12 PM org.apache.catalina.startup.Catalina start
+    SEVERE: Cannot start server. Server instance is not configured.
+
+{% endhighlight %}
+
+As expected, we can see here that the `unmatched-element` element has not been terminated in the `server.xml` file

http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/index.md
----------------------------------------------------------------------
diff --git a/docs/guide/ops/troubleshooting/index.md b/docs/guide/ops/troubleshooting/index.md
index ca0b8a9..ee8dfd7 100644
--- a/docs/guide/ops/troubleshooting/index.md
+++ b/docs/guide/ops/troubleshooting/index.md
@@ -2,10 +2,11 @@
 title: Troubleshooting
 layout: website-normal
 children:
-- troubleshooting-runtime-errors.md
-- troubleshooting-deployment.md
-- troubleshooting-softwareprocess.md
-- troubleshooting-connectivity.md
+- { path: overview.md, title: Overview }
+- { path: deployment.md, title: Deployment }
+- { path: connectivity.md, title: Server Connectivity }
+- { path: softwareprocess.md, title: SoftwareProcess Entities }
+- { path: going-deep-in-java-and-logs.md, title: Going Deep in Java and Logs }
 ---
 
-{% include list-children.html %}
\ No newline at end of file
+{% include list-children.html %}

http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/overview.md
----------------------------------------------------------------------
diff --git a/docs/guide/ops/troubleshooting/overview.md b/docs/guide/ops/troubleshooting/overview.md
new file mode 100644
index 0000000..815d33c
--- /dev/null
+++ b/docs/guide/ops/troubleshooting/overview.md
@@ -0,0 +1,116 @@
+---
+layout: website-normal
+title: Troubleshooting Overview
+toc: /guide/toc.json
+---
+
+This guide describes sources of information for understanding when things go wrong.
+
+Whether you're customizing out-of-the-box blueprints, or developing your own custom blueprints, you will
+inevitably have to deal with entity failure. Thankfully Brooklyn provides plenty of information to help 
+you locate and resolve any issues you may encounter.
+
+
+## Web-console Runtime Error Information
+ 
+### Entity Hierarchy
+
+The Brooklyn web-console includes a tree view of the entities within an application. Errors within the
+application are represented visually, showing a "fire" image on the entity.
+
+When an error causes an entire application to be unexpectedly down, the error is generally propagated to the
+top-level entity - i.e. marking it as "on fire". To find the underlying error, one should expand the entity
+hierarchy tree to find the specific entities that have actually failed.
+
+
+### Entity's Error Status
+
+Many entities have some common sensors (i.e. attributes) that give details of the error status:
+
+* `service.isUp` (often referred to as "service up") is a boolean, saying whether the service is up. For many 
+  software processes, this is inferred from whether the "service.notUp.indicators" is empty. It is also
+  possible for some entities to set this attribute directly.
+* `service.notUp.indicators` is a map of errors. This often gives much more information than the single 
+  `service.isUp` attribute. For example, there may be many health-check indicators for a component: 
+  is the root URL reachable, it the management api reporting healthy, is the process running, etc.
+* `service.problems` is a map of namespaced indicators of problems with a service.
+* `service.state` is the actual state of the service - e.g. CREATED, STARTING, RUNNING, STOPPING, STOPPED, 
+  DESTROYED and ON_FIRE.
+* `service.state.expected` indicates the state the service is expected to be in (and when it transitioned to that).
+  For example, is the service expected to be starting, running, stopping, etc.
+
+These sensor values are shown in the "sensors" tab - see below.
+
+
+### Sensors View
+
+The "Sensors" tab in the Brooklyn web-console shows the attribute values of a particular entity.
+This gives lots of runtime information, including about the health of the entity - the 
+set of attributes will vary between different entity types.
+
+[![Sensors view in the Brooklyn debug console.](images/jmx-sensors.png)](images/jmx-sensors-large.png)
+
+Note that null (or not set) sensors are hidden by default. You can click on the `Show/hide empty records` 
+icon (highlighted in yellow above) to see these sensors as well.
+
+The sensors view is also tabulated. You can configure the numbers of sensors shown per page 
+(at the bottom). There is also a search bar (at the top) to filter the sensors shown.
+
+
+### Activity View
+
+The activity view shows the tasks executed by a given entity. The top-level tasks are the effectors
+(i.e. operations) invoked on that entity. This view allows one to drill into the task, to 
+see details of errors.
+
+Select the entity, and then click on the `Activities` tab.
+
+In the table showing the tasks, each row is a link - clicking on the row will drill into the details of that task, 
+including sub-tasks:
+
+[![Task failure error in the Brooklyn debug console.](images/failed-task.png)](images/failed-task-large.png)
+
+For ssh tasks, this allows one to drill down to see the env, stdin, stdout and stderr. That is, you can see the
+commands executed (stdin) and environment variables (env), and the output from executing that (stdout and stderr). 
+
+For tasks that did not fail, one can still drill into the tasks to see what was done.
+
+It's always worth looking at the Detailed Status section as sometimes that will give you the information you need.
+For example, it can show the exception stack trace in the thread that was executing the task that failed.
+
+
+## Log Files
+
+Brooklyn's logging is configurable, for the files created, the logging levels, etc. 
+See [Logging docs](../logging.html).
+
+With out-of-the-box logging, `brooklyn.info.log` and `brooklyn.debug.log` files are created. These are by default 
+rolling log files: when the log reaches a given size, it is compressed and a new log file is started.
+Therefore check the timestamps of the log files to ensure you are looking in the correct file for the 
+time of your error.
+
+With out-of-the-box logging, info, warnings and errors are written to the `brooklyn.info.log` file. This gives
+a summary of the important actions and errors. However, it does not contain full stacktraces for errors.
+
+To find the exception, we'll need to look in Brooklyn's debug log file. By default, the debug log file
+is named `brooklyn.debug.log`. You can use your favourite tools for viewing large text files. 
+
+One possible tool is `less`, e.g. `less brooklyn.debug.log`. We can quickly find the last exception 
+by navigating to the end of the log file (using `Shift-G`), then performing a reverse-lookup by typing `?Exception` 
+and pressing `Enter`. Sometimes an error results in multiple exceptions being logged (e.g. first for the
+entity, then for the cluster, then for the app). If you know the text of the error message (e.g. copy-pasted
+from the Activities view of the web-console) then one can search explicitly for that text.
+
+The `grep` command is also extremely helpful. Useful things to grep for include:
+
+* The entity id (see the "summary" tab of the entity in the web-console for the id).
+* The entity type name (if there are only a small number of entities of that type). 
+* The VM IP address.
+* A particular error message (e.g. copy-pasted from the Activities view of the web-console).
+* The word WARN etc, such as `grep -E "WARN|ERROR" brooklyn.info.log`.
+
+Grep'ing for particular log messages is also useful. Some examples are shown below:
+
+* INFO: "Started application", "Stopping application" and "Stopped application"
+* INFO: "Creating VM "
+* DEBUG: "Finished VM "

http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/softwareprocess.md
----------------------------------------------------------------------
diff --git a/docs/guide/ops/troubleshooting/softwareprocess.md b/docs/guide/ops/troubleshooting/softwareprocess.md
new file mode 100644
index 0000000..5e7f883
--- /dev/null
+++ b/docs/guide/ops/troubleshooting/softwareprocess.md
@@ -0,0 +1,50 @@
+---
+layout: website-normal
+title: Troubleshooting SoftwareProcess Entities
+toc: /guide/toc.json
+---
+
+The [troubleshooting overview](overview.html) in Brooklyn gives 
+information for how to find more information about errors.
+
+If that doesn't give enough information to diagnose, fix or workaround the problem, then it can be required
+to login to the machine, to investigate further. This guide applies to entities that are types
+of "SoftwareProcess" in Brooklyn, or that follows those conventions.
+
+
+## VM connection details
+
+The ssh connection details for an entity is published to a sensor `host.sshAddress`. The login 
+credentials will depend on the Brooklyn configuration. The default is to use the `~/.ssh/id_rsa` 
+or `~/.ssh/id_dsa` on the Brooklyn host (uploading the associated `~/.ssh/id_rsa.pub` to the machine's 
+authorised_keys). However, this can be overridden (e.g. with specific passwords etc) in the 
+location's configuration.
+
+For Windows, there is a similar sensor with the name `host.winrmAddress`. (TODO sensor for password?) 
+
+
+## Install and Run Directories
+
+For ssh-based software processes, the install directory and the run directory are published as sensors
+`install.dir` and `run.dir` respectively.
+
+For some entities, files are unpacked into the install dir; configuration files are written to the
+run dir along with log files. For some other entities, these directories may be mostly empty - 
+e.g. if installing RPMs, and that software writes its logs to a different standard location.
+
+Most entities have a sensor `log.location`. It is generally worth checking this, along with other files
+in the run directory (such as console output).
+
+
+## Process and OS Health
+
+It is worth checking that the process is running, e.g. using `ps aux` to look for the desired process.
+Some entities also write the pid of the process to `pid.txt` in the run directory.
+
+It is also worth checking if the required port is accessible. This is discussed in the guide 
+"Troubleshooting Server Connectivity Issues in the Cloud", including listing the ports in use:
+execute `netstat -antp` (or on OS X `netstat -antp TCP`) to list the TCP ports in use (or use
+`-anup` for UDP).
+
+It is also worth checking the disk space on the server, e.g. using `df -m`, to check that there
+is sufficient space on each of the required partitions.


Mime
View raw message