Return-Path: X-Original-To: apmail-brooklyn-commits-archive@minotaur.apache.org Delivered-To: apmail-brooklyn-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 78A7A17EF1 for ; Tue, 28 Jul 2015 15:52:25 +0000 (UTC) Received: (qmail 95166 invoked by uid 500); 28 Jul 2015 15:45:36 -0000 Delivered-To: apmail-brooklyn-commits-archive@brooklyn.apache.org Received: (qmail 95145 invoked by uid 500); 28 Jul 2015 15:45:36 -0000 Mailing-List: contact commits-help@brooklyn.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@brooklyn.incubator.apache.org Delivered-To: mailing list commits@brooklyn.incubator.apache.org Received: (qmail 95136 invoked by uid 99); 28 Jul 2015 15:45:36 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jul 2015 15:45:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id EEA681A79BA for ; Tue, 28 Jul 2015 15:45:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.791 X-Spam-Level: * X-Spam-Status: No, score=1.791 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id FE43cy9NOYw9 for ; Tue, 28 Jul 2015 15:45:21 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with SMTP id 57DEA43812 for ; Tue, 28 Jul 2015 15:45:21 +0000 (UTC) Received: (qmail 94572 invoked by uid 99); 28 Jul 2015 15:45:20 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jul 2015 15:45:20 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id CC5A0E188B; Tue, 28 Jul 2015 15:45:20 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: aledsage@apache.org To: commits@brooklyn.incubator.apache.org Date: Tue, 28 Jul 2015 15:45:21 -0000 Message-Id: In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [2/4] incubator-brooklyn git commit: rejig titles of troubleshooting sections http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/troubleshooting-connectivity.md ---------------------------------------------------------------------- diff --git a/docs/guide/ops/troubleshooting/troubleshooting-connectivity.md b/docs/guide/ops/troubleshooting/troubleshooting-connectivity.md deleted file mode 100644 index 07874c0..0000000 --- a/docs/guide/ops/troubleshooting/troubleshooting-connectivity.md +++ /dev/null @@ -1,143 +0,0 @@ ---- -layout: website-normal -title: Troubleshooting Server Connectivity Issues in the Cloud -toc: /guide/toc.json ---- - -A common problem when setting up an application in the cloud is getting the basic connectivity right - how -do I get my service (e.g. a TCP host:port) publicly accessible over the internet? - -This varies a lot - e.g. Is the VM public or in a private network? Is the service only accessible through -a load balancer? Should the service be globally reachable or only to a particular CIDR? - -This guide gives some general tips for debugging connectivity issues, which are applicable to a -range of different service types. Choose those that are appropriate for your use-case. - -## VM reachable -If the VM is supposed to be accessible directly (e.g. from the public internet, or if in a private network -then from a jump host)... - -### ping -Can you `ping` the VM from the machine you are trying to reach it from? - -However, ping is over ICMP. If the VM is unreachable, it could be that the firewall forbids ICMP but still -lets TCP traffic through. - -### telnet to TCP port -You can check if a given TCP port is reachable and listening using `telnet `, such as -`telnet www.google.com 80`, which gives output like: - -``` - Trying 31.55.163.219... - Connected to www.google.com. - Escape character is '^]'. -``` - -If this is very slow to respond, it can be caused by a firewall blocking access. If it is fast, it could -be that the server is just not listening on that port. - -### DNS and routing -If using a hostname rather than IP, then is it resolving to a sensible IP? - -Is the route to the server sensible? (e.g. one can hit problems with proxy servers in a corporate -network, or ISPs returning a default result for unknown hosts). - -The following commands can be useful: - -* `host` is a DNS lookup utility. e.g. `host www.google.com`. -* `dig` stands for "domain information groper". e.g. `dig www.google.com`. -* `traceroute` prints the route that packets take to a network host. e.g. `traceroute www.google.com`. - -## Service is listening - -### Service responds -Try connecting to the service from the VM itself. For example, `curl http://localhost:8080` for a -web-service. - -On dev/test VMs, don't be afraid to install the utilities you need such as `curl`, `telnet`, `nc`, -etc. Cloud VMs often have a very cut-down set of packages installed. For example, execute -`sudo apt-get update; sudo apt-get install -y curl` or `sudo yum install -y curl`. - -### Listening on port -Check that the service is listening on the port, and on the correct NIC(s). - -Execute `netstat -antp` (or on OS X `netstat -antp TCP`) to list the TCP ports in use (or use -`-anup` for UDP). You should expect to see the something like the output below for a service. - -``` -Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name -tcp 0 0 :::8080 :::* LISTEN 8276/java -``` - -In this case a Java process with pid 8276 is listening on port 8080. The local address `:::8080` -format means all NICs (in IPv6 address format). You may also see `0.0.0.0:8080` for IPv4 format. -If it says 127.0.0.1:8080 then your service will most likely not be reachable externally. - -Use `ip addr show` (or the obsolete `ifconfig -a`) to see the network interfaces on your server. - -For `netstat`, run with `sudo` to see the pid for all listed ports. - -## Firewalls -On Linux, check if `iptables` is preventing the remote connection. On Windows, check the Windows Firewall. - -If it is acceptable (e.g. it is not a server in production), try turning off the firewall temporarily, -and testing connectivity again. Remember to re-enable it afterwards! On CentOS, this is `sudo service -iptables stop`. On Ubuntu, use `sudo ufw disable`. On Windows, press the Windows key and type 'Windows -Firewall with Advanced Security' to open the firewall tools, then click 'Windows Firewall Properties' -and set the firewall state to 'Off' in the Domain, Public and Private profiles. - -If you cannot temporarily turn off the firewall, then look carefully at the firewall settings. For -example, execute `sudo iptables -n --list` and `iptables -t nat -n --list`. - -## Cloud firewalls -Some clouds offer a firewall service, where ports need to be explicitly listed to be reachable. - -For example, [security groups for EC2-classic] -(http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#ec2-classic-security-groups) -have rules for the protocols and ports to be reachable from specific CIDRs. - -Check these settings via the cloud provider's web-console (or API). - -## Quick test of a listener port -It can be useful to start listening on a given port, and to then check if that port is reachable. -This is useful for testing basic connectivity when your service is not yet running, or to a -different port to compare behaviour, or to compare with another VM in the network. - -The `nc` netcat tool is useful for this. For example, `nc -l 0.0.0.0 8080` will listen on port -TCP 8080 on all network interfaces. On another server, you can then run `echo hello from client -| nc 8080`. If all works well, this will send "hello from client" over the TCP port 8080, -which will be written out by the `nc -l` process before exiting. - -Similarly for UDP, you use `-lU`. - -You may first have to install `nc`, e.g. with `sudo yum install -y nc` or `sudo apt-get install netcat`. - -### Cloud load balancers -For some use-cases, it is good practice to use the load balancer service offered by the cloud provider -(e.g. [ELB in AWS](http://aws.amazon.com/elasticloadbalancing/) or the [Cloudstack Load Balancer] -(http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/latest/network_setup.html#management-server-load-balancing)) - -The VMs can all be isolated within a private network, with access only through the load balancer service. - -Debugging techniques here include ensuring connectivity from another jump server within the private -network, and careful checking of the load-balancer configuration from the Cloud Provider's web-console. - -### DNAT -Use of DNAT is appropriate for some use-cases, where a particular port on a particular VM is to be -made available. - -Debugging connectivity issues here is similar to the steps for a cloud load balancer. Ensure -connectivity from another jump server within the private network. Carefully check the NAT rules from -the Cloud Provider's web-console. - -### Guest wifi -It is common for guest wifi to restrict access to only specific ports (e.g. 80 and 443, restricting -ssh over port 22 etc). - -Normally your best bet is then to abandon the guest wifi (e.g. to tether to a mobile phone instead). - -There are some unconventional workarounds such as [configuring sshd to listen on port 80 so you can -use an ssh tunnel](http://askubuntu.com/questions/107173/is-it-possible-to-ssh-through-port-80). -However, the firewall may well inspect traffic so sending non-http traffic over port 80 may still fail. - - http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/troubleshooting-deployment.md ---------------------------------------------------------------------- diff --git a/docs/guide/ops/troubleshooting/troubleshooting-deployment.md b/docs/guide/ops/troubleshooting/troubleshooting-deployment.md deleted file mode 100644 index c343762..0000000 --- a/docs/guide/ops/troubleshooting/troubleshooting-deployment.md +++ /dev/null @@ -1,88 +0,0 @@ ---- -layout: website-normal -title: Troubleshooting Deployment -toc: /guide/toc.json ---- - -This guide describes common problems encountered when deploying applications. - - -## YAML deployment errors - -The error `Invalid YAML: Plan not in acceptable format: Cannot convert ...` means that the text is not -valid YAML. Common reasons include that the indentation is incorrect, or that there are non-matching -brackets. - -The error `Unrecognized application blueprint format: no services defined` means that the `services:` -section is missing. - -An error like `Deployment plan item io.brooklyn.camp.spi.pdp.Service@23c159e2[name=,description=,serviceType=com.acme.Foo,characteristics=[],customAttributes={}] cannot be matched` means that the given entity type (in this case com.acme.Foo) is not in the catalog or on the classpath. - -An error like `Illegal parameter for 'location' (aws-ec3); not resolvable: java.util.NoSuchElementException: Unknown location 'aws-ec3': either this location is not recognised or there is a problem with location resolver configuration` means that the given location (in this case aws-ec3) -was unknown. This means it does not match any of the named locations in brooklyn.properties, nor any of the -clouds enabled in the jclouds support, nor any of the locations added dynamically through the catalog API. - - -## VM Provisioning Failures - -There are many stages at which VM provisioning can fail! An error `Failure running task provisioning` -means there was some problem obtaining or connecting to the machine. - -An error like `... Not authorized to access cloud ...` usually means the wrong identity/credential was used. - -An error like `Unable to match required VM template constraints` means that a matching image (e.g. AMI in AWS terminology) could not be found. This -could be because an incorrect explicit image id was supplied, or because the match-criteria could not -be satisfied using the given images available in the given cloud. The first time this error is -encountered, a listing of all images in that cloud/region will be written to the debug log. - -Failure to form an ssh connection to the newly provisioned VM can be reported in several different ways, -depending on the nature of the error. This breaks down into failures at different points: - -* Failure to reach the ssh port (e.g. `... could not connect to any ip address port 22 on node ...`). -* Failure to do the very initial ssh login (e.g. `... Exhausted available authentication methods ...`). -* Failure to ssh using the newly created user. - -There are many possible reasons for this ssh failure, which include: - -* The VM was "dead on arrival" (DOA) - sometimes a cloud will return an unusable VM. One can work around - this using the `machineCreateAttempts` configuration option, to automatically retry with a new VM. -* Local network restrictions. On some guest wifis, external access to port 22 is forbidden. - Check by manually trying to reach port 22 on a different machine that you have access it. -* NAT rules not set up correctly. On some clouds that have only private IPs, Brooklyn can automatically - create NAT rules to provide access to port 22. If this NAT rule creation fails for some reason, - then Brooklyn will not be able to reach the VM. If NAT rules are being created for your cloud, then - check the logs for warnings or errors about the NAT rule creation. -* ssh credentials incorrectly configured. The Brooklyn configuration is very flexible in how ssh - credentials can be configured. However, if a more advanced configuration is used incorrectly (e.g. - the wrong login user, or invalid ssh keys) then this will fail. -* Wrong login user. The initial login user to use when first logging into the new VM is inferred from - the metadata provided by the cloud provider about that image. This can sometimes be incomplete, so - the wrong user may be used. This can be explicitly set using the `loginUser` configuration option. - An example of this is with some Ubuntu VMs, where the "ubuntu" user should be used. However, on some clouds - it defaults to trying to ssh as "root". -* Bad choice of user. By default, Brooklyn will create a user with the same name as the user running the - Brooklyn process; the choice of user name is configurable. If this user already exists on the machine, - then the user setup will not behave as expected. Subsequent attempts to ssh using this user could then fail. -* Custom credentials on the VM. Most clouds will automatically set the ssh login details (e.g. in AWS using - the key-pair, or in CloudStack by auto-generating a password). However, with some custom images the VM - will have hard-coded credentials that must be used. If Brooklyn's configuration does not match that, - then it will fail. -* Guest customisation by the cloud. On some clouds (e.g. vCloud Air), the VM can be configured to do - guest customisation immediately after the VM starts. This can include changing the root password. - If Brooklyn is not configured with the expected changed password, then the VM provisioning may fail - (depending if Brooklyn connects before or after the password is changed!). - -A very useful debug configuration is to set `destroyOnFailure` to false. This will allow ssh failures to -be more easily investigated. - - -## Timeout Waiting For Service-Up - -A common generic error message is that there was a timeout waiting for service-up. - -This just means that the entity did not get to service-up in the pre-defined time period (the default is -two minutes, and can be configured using the `start.timeout` config key; the timer begins after the -start tasks are completed). - -See the guide on [runtime errors](troubleshooting-runtime-errors.html) for where to find additional information, especially the section on -"Entity's Error Status". http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/troubleshooting-runtime-errors.md ---------------------------------------------------------------------- diff --git a/docs/guide/ops/troubleshooting/troubleshooting-runtime-errors.md b/docs/guide/ops/troubleshooting/troubleshooting-runtime-errors.md deleted file mode 100644 index 8b657fc..0000000 --- a/docs/guide/ops/troubleshooting/troubleshooting-runtime-errors.md +++ /dev/null @@ -1,116 +0,0 @@ ---- -layout: website-normal -title: Troubleshooting Runtime Errors -toc: /guide/toc.json ---- - -This guide describes sources of information for runtime errors. - -Whether you're customizing out-of-the-box blueprints, or developing your own custom blueprints, you will -inevitably have to deal with entity failure. Thankfully Brooklyn provides plenty of information to help -you locate and resolve any issues you may encounter. - - -## Web-console Runtime Error Information - -### Entity Hierarchy - -The Brooklyn web-console includes a tree view of the entities within an application. Errors within the -application are represented visually, showing a "fire" image on the entity. - -When an error causes an entire application to be unexpectedly down, the error is generally propagated to the -top-level entity - i.e. marking it as "on fire". To find the underlying error, one should expand the entity -hierarchy tree to find the specific entities that have actually failed. - - -### Entity's Error Status - -Many entities have some common sensors (i.e. attributes) that give details of the error status: - -* `service.isUp` (often referred to as "service up") is a boolean, saying whether the service is up. For many - software processes, this is inferred from whether the "service.notUp.indicators" is empty. It is also - possible for some entities to set this attribute directly. -* `service.notUp.indicators` is a map of errors. This often gives much more information than the single - `service.isUp` attribute. For example, there may be many health-check indicators for a component: - is the root URL reachable, it the management api reporting healthy, is the process running, etc. -* `service.problems` is a map of namespaced indicators of problems with a service. -* `service.state` is the actual state of the service - e.g. CREATED, STARTING, RUNNING, STOPPING, STOPPED, - DESTROYED and ON_FIRE. -* `service.state.expected` indicates the state the service is expected to be in (and when it transitioned to that). - For example, is the service expected to be starting, running, stopping, etc. - -These sensor values are shown in the "sensors" tab - see below. - - -### Sensors View - -The "Sensors" tab in the Brooklyn web-console shows the attribute values of a particular entity. -This gives lots of runtime information, including about the health of the entity - the -set of attributes will vary between different entity types. - -[![Sensors view in the Brooklyn debug console.](images/jmx-sensors.png)](images/jmx-sensors-large.png) - -Note that null (or not set) sensors are hidden by default. You can click on the `Show/hide empty records` -icon (highlighted in yellow above) to see these sensors as well. - -The sensors view is also tabulated. You can configure the numbers of sensors shown per page -(at the bottom). There is also a search bar (at the top) to filter the sensors shown. - - -### Activity View - -The activity view shows the tasks executed by a given entity. The top-level tasks are the effectors -(i.e. operations) invoked on that entity. This view allows one to drill into the task, to -see details of errors. - -Select the entity, and then click on the `Activities` tab. - -In the table showing the tasks, each row is a link - clicking on the row will drill into the details of that task, -including sub-tasks: - -[![Task failure error in the Brooklyn debug console.](images/failed-task.png)](images/failed-task-large.png) - -For ssh tasks, this allows one to drill down to see the env, stdin, stdout and stderr. That is, you can see the -commands executed (stdin) and environment variables (env), and the output from executing that (stdout and stderr). - -For tasks that did not fail, one can still drill into the tasks to see what was done. - -It's always worth looking at the Detailed Status section as sometimes that will give you the information you need. -For example, it can show the exception stack trace in the thread that was executing the task that failed. - - -## Log Files - -Brooklyn's logging is configurable, for the files created, the logging levels, etc. -See [Logging docs](/guide/ops/logging.html). - -With out-of-the-box logging, `brooklyn.info.log` and `brooklyn.debug.log` files are created. These are by default -rolling log files: when the log reaches a given size, it is compressed and a new log file is started. -Therefore check the timestamps of the log files to ensure you are looking in the correct file for the -time of your error. - -With out-of-the-box logging, info, warnings and errors are written to the `brooklyn.info.log` file. This gives -a summary of the important actions and errors. However, it does not contain full stacktraces for errors. - -To find the exception, we'll need to look in Brooklyn's debug log file. By default, the debug log file -is named `brooklyn.debug.log`. You can use your favourite tools for viewing large text files. - -One possible tool is `less`, e.g. `less brooklyn.debug.log`. We can quickly find the last exception -by navigating to the end of the log file (using `Shift-G`), then performing a reverse-lookup by typing `?Exception` -and pressing `Enter`. Sometimes an error results in multiple exceptions being logged (e.g. first for the -entity, then for the cluster, then for the app). If you know the text of the error message (e.g. copy-pasted -from the Activities view of the web-console) then one can search explicitly for that text. - -The `grep` command is also extremely helpful. Useful things to grep for include: - -* The entity id (see the "summary" tab of the entity in the web-console for the id). -* The entity type name (if there are only a small number of entities of that type). -* The VM IP address. -* A particular error message (e.g. copy-pasted from the Activities view of the web-console). -* The word WARN etc, such as `grep -E "WARN|ERROR" brooklyn.info.log`. - -Grep'ing for particular log messages is also useful. Some examples are shown below: - -* INFO: "Started application", "Stopping application" and "Stopped application" -* INFO: "Creating VM " -* DEBUG: "Finished VM " http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/troubleshooting-softwareprocess.md ---------------------------------------------------------------------- diff --git a/docs/guide/ops/troubleshooting/troubleshooting-softwareprocess.md b/docs/guide/ops/troubleshooting/troubleshooting-softwareprocess.md deleted file mode 100644 index a09f902..0000000 --- a/docs/guide/ops/troubleshooting/troubleshooting-softwareprocess.md +++ /dev/null @@ -1,50 +0,0 @@ ---- -layout: website-normal -title: Troubleshooting SoftwareProcess Entities -toc: /guide/toc.json ---- - -The [guide for troubleshooting runtime errors](troubleshooting-runtime-errors.html) in Brooklyn gives -information for how to find more information about errors. - -If that doesn't give enough information to diagnose, fix or workaround the problem, then it can be required -to login to the machine, to investigate further. This guide applies to entities that are types -of "SoftwareProcess" in Brooklyn, or that follows those conventions. - - -## VM connection details - -The ssh connection details for an entity is published to a sensor `host.sshAddress`. The login -credentials will depend on the Brooklyn configuration. The default is to use the `~/.ssh/id_rsa` -or `~/.ssh/id_dsa` on the Brooklyn host (uploading the associated `~/.ssh/id_rsa.pub` to the machine's -authorised_keys). However, this can be overridden (e.g. with specific passwords etc) in the -location's configuration. - -For Windows, there is a similar sensor with the name `host.winrmAddress`. (TODO sensor for password?) - - -## Install and Run Directories - -For ssh-based software processes, the install directory and the run directory are published as sensors -`install.dir` and `run.dir` respectively. - -For some entities, files are unpacked into the install dir; configuration files are written to the -run dir along with log files. For some other entities, these directories may be mostly empty - -e.g. if installing RPMs, and that software writes its logs to a different standard location. - -Most entities have a sensor `log.location`. It is generally worth checking this, along with other files -in the run directory (such as console output). - - -## Process and OS Health - -It is worth checking that the process is running, e.g. using `ps aux` to look for the desired process. -Some entities also write the pid of the process to `pid.txt` in the run directory. - -It is also worth checking if the required port is accessible. This is discussed in the guide -"Troubleshooting Server Connectivity Issues in the Cloud", including listing the ports in use: -execute `netstat -antp` (or on OS X `netstat -antp TCP`) to list the TCP ports in use (or use -`-anup` for UDP). - -It is also worth checking the disk space on the server, e.g. using `df -m`, to check that there -is sufficient space on each of the required partitions.