Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 828F0200C8B for ; Mon, 22 May 2017 22:10:54 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 811D9160BBF; Mon, 22 May 2017 20:10:54 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4FC94160BDD for ; Mon, 22 May 2017 22:10:52 +0200 (CEST) Received: (qmail 57216 invoked by uid 500); 22 May 2017 20:10:51 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 57117 invoked by uid 99); 22 May 2017 20:10:51 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 May 2017 20:10:51 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id F3F6FE1863; Mon, 22 May 2017 20:10:50 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: mwalch@apache.org To: commits@accumulo.apache.org Date: Mon, 22 May 2017 20:10:51 -0000 Message-Id: <5b7140027af04ff6a423cbfeba4bd3bd@git.apache.org> In-Reply-To: <9a5cc3f40b374a2d9eb18feb6cab389a@git.apache.org> References: <9a5cc3f40b374a2d9eb18feb6cab389a@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [02/10] accumulo-website git commit: Split up long administration and troubleshooting overview pages archived-at: Mon, 22 May 2017 20:10:54 -0000 http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/036341f2/_docs-unreleased/administration/overview.md ---------------------------------------------------------------------- diff --git a/_docs-unreleased/administration/overview.md b/_docs-unreleased/administration/overview.md deleted file mode 100644 index 39ff163..0000000 --- a/_docs-unreleased/administration/overview.md +++ /dev/null @@ -1,1151 +0,0 @@ ---- -title: Overview -category: administration -order: 1 ---- - -## Hardware - -Because we are running essentially two or three systems simultaneously layered -across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to -consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process can have -at least one core and 2 - 4 GB each. - -One core running HDFS can typically keep 2 to 4 disks busy, so each machine may -typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB disks. - -It is possible to do with less than this, such as with 1u servers with 2 cores and 4GB -each, but in this case it is recommended to only run up to two processes per -machine -- i.e. DataNode and TabletServer or DataNode and MapReduce worker but -not all three. The constraint here is having enough available heap space for all the -processes on a machine. - -## Network - -Accumulo communicates via remote procedure calls over TCP/IP for both passing -data and control messages. In addition, Accumulo uses HDFS clients to -communicate with HDFS. To achieve good ingest and query performance, sufficient -network bandwidth must be available between any two machines. - -In addition to needing access to ports associated with HDFS and ZooKeeper, Accumulo will -use the following default ports. Please make sure that they are open, or change -their value in accumulo-site.xml. - -|Port | Description | Property Name -|-----|-------------|-------------- -|4445 | Shutdown Port (Accumulo MiniCluster) | n/a -|4560 | Accumulo monitor (for centralized log display) | monitor.port.log4j -|9995 | Accumulo HTTP monitor | monitor.port.client -|9997 | Tablet Server | tserver.port.client -|9998 | Accumulo GC | gc.port.client -|9999 | Master Server | master.port.client -|12234 | Accumulo Tracer | trace.port.client -|42424 | Accumulo Proxy Server | n/a -|10001 | Master Replication service | master.replication.coordinator.port -|10002 | TabletServer Replication service | replication.receipt.service.port - -In addition, the user can provide `0` and an ephemeral port will be chosen instead. This -ephemeral port is likely to be unique and not already bound. Thus, configuring ports to -use `0` instead of an explicit value, should, in most cases, work around any issues of -running multiple distinct Accumulo instances (or any other process which tries to use the -same default ports) on the same hardware. Finally, the *.port.client properties will work -with the port range syntax (M-N) allowing the user to specify a range of ports for the -service to attempt to bind. The ports in the range will be tried in a 1-up manner starting -at the low end of the range to, and including, the high end of the range. - -## Installation - -Download a binary distribution of Accumulo and install it to a directory on a disk with -sufficient space: - - cd - tar xzf accumulo-X.Y.Z-bin.tar.gz # Replace 'X.Y.Z' with your Accumulo version - cd accumulo-X.Y.Z - -Repeat this step on each machine in your cluster. Typically, the same `` -is chosen for all machines in the cluster. - -There are four scripts in the `bin/` directory that are used to manage Accumulo: - -1. `accumulo` - Runs Accumulo command-line tools and starts Accumulo processes -2. `accumulo-service` - Runs Accumulo processes as services -3. `accumulo-cluster` - Manages Accumulo cluster on a single node or several nodes -4. `accumulo-util` - Accumulo utilities for creating configuration, native libraries, etc. - -These scripts will be used in the remaining instructions to configure and run Accumulo. - -## Dependencies - -Accumulo requires HDFS and ZooKeeper to be configured and running -before starting. Password-less SSH should be configured between at least the -Accumulo master and TabletServer machines. It is also a good idea to run Network -Time Protocol (NTP) within the cluster to ensure nodes' clocks don't get too out of -sync, which can cause problems with automatically timestamped data. - -## Configuration - -The Accumulo tarball contains a `conf/` directory where Accumulo looks for configuration. If you -installed Accumulo using downstream packaging, the `conf/` could be something else like -`/etc/accumulo/`. - -Before starting Accumulo, the configuration files `accumulo-env.sh` and `accumulo-site.xml` must -exist in `conf/` and be properly configured. If you are using `accumulo-cluster` to launch -a cluster, the `conf/` directory must also contain hosts file for Accumulo services (i.e `gc`, -`masters`, `monitor`, `tservers`, `tracers`). You can either create these files manually or run -`accumulo-cluster create-config`. - -Logging is configured in `accumulo-env.sh` to use three log4j configuration files in `conf/`. The -file used depends on the Accumulo command or service being run. Logging for most Accumulo services -(i.e Master, TabletServer, Garbage Collector) is configured by `log4j-service.properties` except for -the Monitor which is configured by `log4j-monitor.properties`. All Accumulo commands (i.e `init`, -`shell`, etc) are configured by `log4j.properties`. - -### Configure accumulo-env.sh - -Accumulo needs to know where to find the software it depends on. Edit accumulo-env.sh -and specify the following: - -1. Enter the location of Hadoop for `$HADOOP_PREFIX` -2. Enter the location of ZooKeeper for `$ZOOKEEPER_HOME` -3. Optionally, choose a different location for Accumulo logs using `$ACCUMULO_LOG_DIR` - -Accumulo uses `HADOOP_PREFIX` and `ZOOKEEPER_HOME` to locate Hadoop and Zookeeper jars -and add them the `CLASSPATH` variable. If you are running a vendor-specific release of Hadoop -or Zookeeper, you may need to change how your `CLASSPATH` is built in `accumulo-env.sh`. If -Accumulo has problems later on finding jars, run `accumulo classpath -d` to debug and print -Accumulo's classpath. - -You may want to change the default memory settings for Accumulo's TabletServer which are -by set in the `JAVA_OPTS` settings for 'tservers' in `accumulo-env.sh`. Note the -syntax is that of the Java JVM command line options. This value should be less than the -physical memory of the machines running TabletServers. - -There are similar options for the master's memory usage and the garbage collector -process. Reduce these if they exceed the physical RAM of your hardware and -increase them, within the bounds of the physical RAM, if a process fails because of -insufficient memory. - -Note that you will be specifying the Java heap space in accumulo-env.sh. You should -make sure that the total heap space used for the Accumulo tserver and the Hadoop -DataNode and TaskTracker is less than the available memory on each worker node in -the cluster. On large clusters, it is recommended that the Accumulo master, Hadoop -NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate -machines to allow them to use more heap space. If you are running these on the -same machine on a small cluster, likewise make sure their heap space settings fit -within the available memory. - -### Native Map - -The tablet server uses a data structure called a MemTable to store sorted key/value -pairs in memory when they are first received from the client. When a minor compaction -occurs, this data structure is written to HDFS. The MemTable will default to using -memory in the JVM but a JNI version, called the native map, can be used to significantly -speed up performance by utilizing the memory space of the native operating system. The -native map also avoids the performance implications brought on by garbage collection -in the JVM by causing it to pause much less frequently. - -#### Building - -32-bit and 64-bit Linux and Mac OS X versions of the native map can be built by executing -`accumulo-util build-native`. If your system's default compiler options are insufficient, -you can add additional compiler options to the command line, such as options for the -architecture. These will be passed to the Makefile in the environment variable `USERFLAGS`. - -Examples: - - accumulo-util build-native - accumulo-util build-native -m32 - -After building the native map from the source, you will find the artifact in -`lib/native`. Upon starting up, the tablet server will look -in this directory for the map library. If the file is renamed or moved from its -target directory, the tablet server may not be able to find it. The system can -also locate the native maps shared library by setting `LD_LIBRARY_PATH` -(or `DYLD_LIBRARY_PATH` on Mac OS X) in `accumulo-env.sh`. - -#### Native Maps Configuration - -As mentioned, Accumulo will use the native libraries if they are found in the expected -location and `tserver.memory.maps.native.enabled` is set to `true` (which is the default). -Using the native maps over JVM Maps nets a noticeable improvement in ingest rates; however, -certain configuration variables are important to modify when increasing the size of the -native map. - -To adjust the size of the native map, increase the value of `tserver.memory.maps.max`. -By default, the maximum size of the native map is 1GB. When increasing this value, it is -also important to adjust the values of `table.compaction.minor.logs.threshold` and -`tserver.walog.max.size`. `table.compaction.minor.logs.threshold` is the maximum -number of write-ahead log files that a tablet can reference before they will be automatically -minor compacted. `tserver.walog.max.size` is the maximum size of a write-ahead log. - -The maximum size of the native maps for a server should be less than the product -of the write-ahead log maximum size and minor compaction threshold for log files: - -`$table.compaction.minor.logs.threshold * $tserver.walog.max.size >= $tserver.memory.maps.max` - -This formula ensures that minor compactions won't be automatically triggered before the native -maps can be completely saturated. - -Subsequently, when increasing the size of the write-ahead logs, it can also be important -to increase the HDFS block size that Accumulo uses when creating the files for the write-ahead log. -This is controlled via `tserver.wal.blocksize`. A basic recommendation is that when -`tserver.walog.max.size` is larger than 2GB in size, set `tserver.wal.blocksize` to 2GB. -Increasing the block size to a value larger than 2GB can result in decreased write -performance to the write-ahead log file which will slow ingest. - -### Cluster Specification - -If you are using `accumulo-cluster` to start a cluster, configure the following on the -machine that will serve as the Accumulo master: - -1. Write the IP address or domain name of the Accumulo Master to the `conf/masters` file. -2. Write the IP addresses or domain name of the machines that will be TabletServers in `conf/tservers`, one per line. - -Note that if using domain names rather than IP addresses, DNS must be configured -properly for all machines participating in the cluster. DNS can be a confusing source -of errors. - -### Configure accumulo-site.xml - -Specify appropriate values for the following settings in `accumulo-site.xml`: - -```xml - - instance.zookeeper.host - zooserver-one:2181,zooserver-two:2181 - list of zookeeper servers - -``` - -This enables Accumulo to find ZooKeeper. Accumulo uses ZooKeeper to coordinate -settings between processes and helps finalize TabletServer failure. - -```xml - - instance.secret - DEFAULT - -``` - -The instance needs a secret to enable secure communication between servers. Configure your -secret and make sure that the `accumulo-site.xml` file is not readable to other users. -For alternatives to storing the `instance.secret` in plaintext, please read the -`Sensitive Configuration Values` section. - -Some settings can be modified via the Accumulo shell and take effect immediately, but -some settings require a process restart to take effect. See the [configuration management][config-mgmt] -documentation for details. - -### Hostnames in configuration files - -Accumulo has a number of configuration files which can contain references to other hosts in your -network. All of the "host" configuration files for Accumulo (`gc`, `masters`, `tservers`, `monitor`, -`tracers`) as well as `instance.volumes` in accumulo-site.xml must contain some host reference. - -While IP address, short hostnames, or fully qualified domain names (FQDN) are all technically valid, it -is good practice to always use FQDNs for both Accumulo and other processes in your Hadoop cluster. -Failing to consistently use FQDNs can have unexpected consequences in how Accumulo uses the FileSystem. - -A common way for this problem can be observed is via applications that use Bulk Ingest. The Accumulo -Master coordinates moving the input files to Bulk Ingest to an Accumulo-managed directory. However, -Accumulo cannot safely move files across different Hadoop FileSystems. This is problematic because -Accumulo also cannot make reliable assertions across what is the same FileSystem which is specified -with different names. Naively, while 127.0.0.1:8020 might be a valid identifier for an HDFS instance, -Accumulo identifies `localhost:8020` as a different HDFS instance than `127.0.0.1:8020`. - -### Deploy Configuration - -Copy accumulo-env.sh and accumulo-site.xml from the `conf/` directory on the master to all Accumulo -tablet servers. The "host" configuration files files `accumulo-cluster` only need to be on servers -where that command is run. - -### Sensitive Configuration Values - -Accumulo has a number of properties that can be specified via the accumulo-site.xml -file which are sensitive in nature, instance.secret and trace.token.property.password -are two common examples. Both of these properties, if compromised, have the ability -to result in data being leaked to users who should not have access to that data. - -In Hadoop-2.6.0, a new CredentialProvider class was introduced which serves as a common -implementation to abstract away the storage and retrieval of passwords from plaintext -storage in configuration files. Any Property marked with the `Sensitive` annotation -is a candidate for use with these CredentialProviders. For version of Hadoop which lack -these classes, the feature will just be unavailable for use. - -A comma separated list of CredentialProviders can be configured using the Accumulo Property -`general.security.credential.provider.paths`. Each configured URL will be consulted -when the Configuration object for accumulo-site.xml is accessed. - -### Using a JavaKeyStoreCredentialProvider for storage - -One of the implementations provided in Hadoop-2.6.0 is a Java KeyStore CredentialProvider. -Each entry in the KeyStore is the Accumulo Property key name. For example, to store the -`instance.secret`, the following command can be used: - - hadoop credential create instance.secret --provider jceks://file/etc/accumulo/conf/accumulo.jceks - -The command will then prompt you to enter the secret to use and create a keystore in: - - /path/to/accumulo/conf/accumulo.jceks - -Then, accumulo-site.xml must be configured to use this KeyStore as a CredentialProvider: - -```xml - - general.security.credential.provider.paths - jceks://file/path/to/accumulo/conf/accumulo.jceks - -``` - -This configuration will then transparently extract the `instance.secret` from -the configured KeyStore and alleviates a human readable storage of the sensitive -property. - -A KeyStore can also be stored in HDFS, which will make the KeyStore readily available to -all Accumulo servers. If the local filesystem is used, be aware that each Accumulo server -will expect the KeyStore in the same location. - -### Client Configuration - -In version 1.6.0, Accumulo included a new type of configuration file known as a client -configuration file. One problem with the traditional "site.xml" file that is prevalent -through Hadoop is that it is a single file used by both clients and servers. This makes -it very difficult to protect secrets that are only meant for the server processes while -allowing the clients to connect to the servers. - -The client configuration file is a subset of the information stored in accumulo-site.xml -meant only for consumption by clients of Accumulo. By default, Accumulo checks a number -of locations for a client configuration by default: - -* `/path/to/accumulo/conf/client.conf` -* `/etc/accumulo/client.conf` -* `/etc/accumulo/conf/client.conf` -* `~/.accumulo/config` - -These files are [Java Properties files](https://en.wikipedia.org/wiki/.properties). These files -can currently contain information about ZooKeeper servers, RPC properties (such as SSL or SASL -connectors), distributed tracing properties. Valid properties are defined by the [ClientProperty](https://github.com/apache/accumulo/blob/f1d0ec93d9f13ff84844b5ac81e4a7b383ced467/core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java#L54) -enum contained in the client API. - -#### Custom Table Tags - -Accumulo has the ability for users to add custom tags to tables. This allows -applications to set application-level metadata about a table. These tags can be -anything from a table description, administrator notes, date created, etc. -This is done by naming and setting a property with a prefix `table.custom.*`. - -Currently, table properties are stored in ZooKeeper. This means that the number -and size of custom properties should be restricted on the order of 10's of properties -at most without any properties exceeding 1MB in size. ZooKeeper's performance can be -very sensitive to an excessive number of nodes and the sizes of the nodes. Applications -which leverage the user of custom properties should take these warnings into -consideration. There is no enforcement of these warnings via the API. - -#### Configuring the ClassLoader - -Accumulo builds its Java classpath in `accumulo-env.sh`. After an Accumulo application has started, it will load classes from the locations -specified in the deprecated `general.classpaths` property. Additionally, Accumulo will load classes from the locations specified in the -`general.dynamic.classpaths` property and will monitor and reload them if they change. The reloading feature is useful during the development -and testing of iterators as new or modified iterator classes can be deployed to Accumulo without having to restart the database. -/ -Accumulo also has an alternate configuration for the classloader which will allow it to load classes from remote locations. This mechanism -uses Apache Commons VFS which enables locations such as http and hdfs to be used. This alternate configuration also uses the -`general.classpaths` property in the same manner described above. It differs in that you need to configure the -`general.vfs.classpaths` property instead of the `general.dynamic.classpath` property. As in the default configuration, this alternate -configuration will also monitor the vfs locations for changes and reload if necessary. - -The Accumulo classpath can be viewed in human readable format by running `accumulo classpath -d`. - -##### ClassLoader Contexts - -With the addition of the VFS based classloader, we introduced the notion of classloader contexts. A context is identified -by a name and references a set of locations from which to load classes and can be specified in the accumulo-site.xml file or added -using the `config` command in the shell. Below is an example for specify the app1 context in the accumulo-site.xml file: - -```xml - - general.vfs.context.classpath.app1 - hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar - Application A classpath, loads jars from HDFS and local file system - -``` - -The default behavior follows the Java ClassLoader contract in that classes, if they exists, are loaded from the parent classloader first. -You can override this behavior by delegating to the parent classloader after looking in this classloader first. An example of this -configuration is: - -```xml - - general.vfs.context.classpath.app1.delegation=post - hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar - Application A classpath, loads jars from HDFS and local file system - -``` - -To use contexts in your application you can set the `table.classpath.context` on your tables or use the `setClassLoaderContext()` method on Scanner -and BatchScanner passing in the name of the context, app1 in the example above. Setting the property on the table allows your minc, majc, and scan -iterators to load classes from the locations defined by the context. Passing the context name to the scanners allows you to override the table setting -to load only scan time iterators from a different location. - -## Initialization - -Accumulo must be initialized to create the structures it uses internally to locate -data across the cluster. HDFS is required to be configured and running before -Accumulo can be initialized. - -Once HDFS is started, initialization can be performed by executing -`accumulo init` . This script will prompt for a name -for this instance of Accumulo. The instance name is used to identify a set of tables -and instance-specific settings. The script will then write some information into -HDFS so Accumulo can start properly. - -The initialization script will prompt you to set a root password. Once Accumulo is -initialized it can be started. - -## Running - -### Starting Accumulo - -Make sure Hadoop is configured on all of the machines in the cluster, including -access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running. -Make sure ZooKeeper is configured and running on at least one machine in the -cluster. -Start Accumulo using `accumulo-cluster start`. - -To verify that Accumulo is running, check the [Accumulo monitor][monitor]. -In addition, the Shell can provide some information about the status of tables via reading the metadata tables. - -### Stopping Accumulo - -To shutdown cleanly, run `accumulo-cluster stop` and the master will orchestrate the -shutdown of all the tablet servers. Shutdown waits for all minor compactions to finish, so it may -take some time for particular configurations. - -### Adding a Tablet Server - -Update your `conf/tservers` file to account for the addition. - -Next, ssh to each of the hosts you want to add and run: - - accumulo-service tserver start - -Make sure the host in question has the new configuration, or else the tablet -server won't start; at a minimum this needs to be on the host(s) being added, -but in practice it's good to ensure consistent configuration across all nodes. - -### Decomissioning a Tablet Server - -If you need to take a node out of operation, you can trigger a graceful shutdown of a tablet -server. Accumulo will automatically rebalance the tablets across the available tablet servers. - - accumulo admin stop { ...} - -Alternatively, you can ssh to each of the hosts you want to remove and run: - - accumulo-service tserver stop - -Be sure to update your `conf/tservers` file to -account for the removal of these hosts. Bear in mind that the monitor will not re-read the -tservers file automatically, so it will report the decommissioned servers as down; it's -recommended that you restart the monitor so that the node list is up to date. - -The steps described to decommission a node can also be used (without removal of the host -from the `conf/tservers` file) to gracefully stop a node. This will -ensure that the tabletserver is cleanly stopped and recovery will not need to be performed -when the tablets are re-hosted. - -### Restarting process on a node - -Occasionally, it might be necessary to restart the processes on a specific node. In addition -to the `accumulo-cluster` script, Accumulo has a `accumulo-service` script that -can be use to start/stop processes on a node. - -#### A note on rolling restarts - -For sufficiently large Accumulo clusters, restarting multiple TabletServers within a short window can place significant -load on the Master server. If slightly lower availability is acceptable, this load can be reduced by globally setting -`table.suspend.duration` to a positive value. - -With `table.suspend.duration` set to, say, `5m`, Accumulo will wait -for 5 minutes for any dead TabletServer to return before reassigning that TabletServer's responsibilities to other TabletServers. -If the TabletServer returns to the cluster before the specified timeout has elapsed, Accumulo will assign the TabletServer -its original responsibilities. - -It is important not to choose too large a value for `table.suspend.duration`, as during this time, all scans against the -data that TabletServer had hosted will block (or time out). - -### Running multiple TabletServers on a single node - -With very powerful nodes, it may be beneficial to run more than one TabletServer on a given -node. This decision should be made carefully and with much deliberation as Accumulo is designed -to be able to scale to using 10's of GB of RAM and 10's of CPU cores. - -Accumulo TabletServers bind certain ports on the host to accommodate remote procedure calls to/from -other nodes. Running more than one TabletServer on a host requires that you set the environment variable -`ACCUMULO_SERVICE_INSTANCE` to an instance number (i.e 1, 2) for each instance that is started. Also, set -these properties in `accumulo-site.xml`: - -```xml - - tserver.port.search - true - - - replication.receipt.service.port - 0 - -``` - -## Monitoring - -### Accumulo Monitor - -The Accumulo Monitor provides an interface for monitoring the status and health of -Accumulo components. The Accumulo Monitor provides a web UI for accessing this information at -`http://_monitorhost_:9995/`. - -Things highlighted in yellow may be in need of attention. -If anything is highlighted in red on the monitor page, it is something that definitely needs attention. - -The Overview page contains some summary information about the Accumulo instance, including the version, instance name, and instance ID. -There is a table labeled Accumulo Master with current status, a table listing the active Zookeeper servers, and graphs displaying various metrics over time. -These include ingest and scan performance and other useful measurements. - -The Master Server, Tablet Servers, and Tables pages display metrics grouped in different ways (e.g. by tablet server or by table). -Metrics typically include number of entries (key/value pairs), ingest and query rates. -The number of running scans, major and minor compactions are in the form _number_running_ (_number_queued_). -Another important metric is hold time, which is the amount of time a tablet has been waiting but unable to flush its memory in a minor compaction. - -The Server Activity page graphically displays tablet server status, with each server represented as a circle or square. -Different metrics may be assigned to the nodes' color and speed of oscillation. -The Overall Avg metric is only used on the Server Activity page, and represents the average of all the other metrics (after normalization). -Similarly, the Overall Max metric picks the metric with the maximum normalized value. - -The Garbage Collector page displays a list of garbage collection cycles, the number of files found of each type (including deletion candidates in use and files actually deleted), and the length of the deletion cycle. -The Traces page displays data for recent traces performed (see the following section for information on [tracing][tracing]). -The Recent Logs page displays warning and error logs forwarded to the monitor from all Accumulo processes. -Also, the XML and JSON links provide metrics in XML and JSON formats, respectively. - -### SSL - -SSL may be enabled for the monitor page by setting the following properties in the `accumulo-site.xml` file: - - monitor.ssl.keyStore - monitor.ssl.keyStorePassword - monitor.ssl.trustStore - monitor.ssl.trustStorePassword - -If the Accumulo conf directory has been configured (in particular the `accumulo-env.sh` file must be set up), the -`accumulo-util gen-monitor-cert` command can be used to create the keystore and truststore files with random passwords. The command -will print out the properties that need to be added to the `accumulo-site.xml` file. The stores can also be generated manually with the -Java `keytool` command, whose usage can be seen in the `accumulo-util` script. - -If desired, the SSL ciphers allowed for connections can be controlled via the following properties in `accumulo-site.xml`: - - monitor.ssl.include.ciphers - monitor.ssl.exclude.ciphers - -If SSL is enabled, the monitor URL can only be accessed via https. -This also allows you to access the Accumulo shell through the monitor page. -The left navigation bar will have a new link to Shell. -An Accumulo user name and password must be entered for access to the shell. - -## Metrics - -Accumulo can expose metrics through a legacy metrics library and using the Hadoop Metrics2 library. - -### Legacy Metrics - -Accumulo has a legacy metrics library that can be exposes metrics using JMX endpoints or file-based logging. These metrics can -be enabled by setting `general.legacy.metrics` to `true` in `accumulo-site.xml` and placing the `accumulo-metrics.xml` -configuration file on the classpath (which is typically done by placing the file in the `conf/` directory). A template for -`accumulo-metrics.xml` can be found in `conf/templates` of the Accumulo tarball. - -### Hadoop Metrics2 - -Hadoop Metrics2 is a library which allows for routing of metrics generated by registered MetricsSources to -configured MetricsSinks. Examples of sinks that are implemented by Hadoop include file-based logging, Graphite and Ganglia. -All metric sources are exposed via JMX when using Metrics2. - -Metrics2 is configured by examining the classpath for a file that matches `hadoop-metrics2*.properties`. The Accumulo tarball -contains an example `hadoop-metrics2-accumulo.properties` file in `conf/templates` which can be copied to `conf/` to place -on classpath. This file is used to enable file, Graphite or Ganglia sinks (some minimal configuration required for Graphite -and Ganglia). Because the Hadoop configuration is also on the Accumulo classpath, be sure that you do not have multiple -Metrics2 configuration files. It is recommended to consolidate metrics in a single properties file in a central location to -remove ambiguity. The contents of `hadoop-metrics2-accumulo.properties` can be added to a central `hadoop-metrics2.properties` -in `$HADOOP_CONF_DIR`. - -As a note for configuring the file sink, the provided path should be absolute. A relative path or file name will be created relative -to the directory in which the Accumulo process was started. External tools, such as logrotate, can be used to prevent these files -from growing without bound. - -Each server process should have log messages from the Metrics2 library about the sinks that were created. Be sure to check -the Accumulo processes log files when debugging missing metrics output. - -For additional information on configuring Metrics2, visit the [Javadoc page for Metrics2](https://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html). - -## Tracing - -It can be difficult to determine why some operations are taking longer -than expected. For example, you may be looking up items with very low -latency, but sometimes the lookups take much longer. Determining the -cause of the delay is difficult because the system is distributed, and -the typical lookup is fast. - -Accumulo has been instrumented to record the time that various -operations take when tracing is turned on. The fact that tracing is -enabled follows all the requests made on behalf of the user throughout -the distributed infrastructure of accumulo, and across all threads of -execution. - -These time spans will be inserted into the `trace` table in -Accumulo. You can browse recent traces from the Accumulo monitor -page. You can also read the `trace` table directly like any -other table. - -The design of Accumulo's distributed tracing follows that of [Google's Dapper](http://research.google.com/pubs/pub36356.html). - -### Tracers - -To collect traces, Accumulo needs at least one tracer server running. If you are using `accumulo-cluster` to start your cluster, -configure your server in `conf/tracers`. The server collects traces from clients and writes them to the `trace` table. The Accumulo -user that the tracer connects to Accumulo with can be configured with the following properties (see the [configuration management][config-mgmt] -page for setting Accumulo server properties) - - trace.user - trace.token.property.password - -Other tracer configuration properties include - - trace.port.client - port tracer listens on - trace.table - table tracer writes to - trace.zookeeper.path - zookeeper path where tracers register - -The zookeeper path is configured to /tracers by default. If -multiple Accumulo instances are sharing the same ZooKeeper -quorum, take care to configure Accumulo with unique values for -this property. - -### Configuring Tracing - -Traces are collected via SpanReceivers. The default SpanReceiver -configured is org.apache.accumulo.core.trace.ZooTraceClient, which -sends spans to an Accumulo Tracer process, as discussed in the -previous section. This default can be changed to a different span -receiver, or additional span receivers can be added in a -comma-separated list, by modifying the property - - trace.span.receivers - -Individual span receivers may require their own configuration -parameters, which are grouped under the trace.span.receiver.* -prefix. ZooTraceClient uses the following properties. The first -three properties are populated from other Accumulo properties, -while the remaining ones should be prefixed with -trace.span.receiver. when set in the Accumulo configuration. - - tracer.zookeeper.host - populated from instance.zookeepers - tracer.zookeeper.timeout - populated from instance.zookeeper.timeout - tracer.zookeeper.path - populated from trace.zookeeper.path - tracer.send.timer.millis - timer for flushing send queue (in ms, default 1000) - tracer.queue.size - max queue size (default 5000) - tracer.span.min.ms - minimum span length to store (in ms, default 1) - -Note that to configure an Accumulo client for tracing, including -the Accumulo shell, the client configuration must be given the same -trace.span.receivers, trace.span.receiver.*, and trace.zookeeper.path -properties as the servers have. - -Hadoop can also be configured to send traces to Accumulo, as of -Hadoop 2.6.0, by setting properties in Hadoop's core-site.xml -file. Instead of using the trace.span.receiver.* prefix, Hadoop -uses hadoop.htrace.*. The Hadoop configuration does not have -access to Accumulo's properties, so the -hadoop.htrace.tracer.zookeeper.host property must be specified. -The zookeeper timeout defaults to 30000 (30 seconds), and the -zookeeper path defaults to /tracers. An example of configuring -Hadoop to send traces to ZooTraceClient is - -```xml - - hadoop.htrace.spanreceiver.classes - org.apache.accumulo.core.trace.ZooTraceClient - - - hadoop.htrace.tracer.zookeeper.host - zookeeperHost:2181 - - - hadoop.htrace.tracer.zookeeper.path - /tracers - - - hadoop.htrace.tracer.span.min.ms - 1 - -``` - -The accumulo-core, accumulo-tracer, accumulo-fate and libthrift -jars must also be placed on Hadoop's classpath. - -##### Adding additional SpanReceivers - -[Zipkin](https://github.com/openzipkin/zipkin) has a SpanReceiver supported by HTrace and popularized by Twitter -that users looking for a more graphical trace display may opt to use. -The following steps configure Accumulo to use `org.apache.htrace.impl.ZipkinSpanReceiver` -in addition to the Accumulo's default ZooTraceClient, and they serve as a template -for adding any SpanReceiver to Accumulo: - -1. Add the Jar containing the ZipkinSpanReceiver class file to the -`lib/` directory. It is critical that the Jar is placed in -`lib/` and NOT in `lib/ext/` so that the new SpanReceiver class -is visible to the same class loader of htrace-core. - -2. Add the following to `accumulo-site.xml`: - - - trace.span.receivers - org.apache.accumulo.tracer.ZooTraceClient,org.apache.htrace.impl.ZipkinSpanReceiver - - -3. Restart your Accumulo tablet servers. - -In order to use ZipkinSpanReceiver from a client as well as the Accumulo server, - -1. Ensure your client can see the ZipkinSpanReceiver class at runtime. For Maven projects, -this is easily done by adding to your client's pom.xml (taking care to specify a good version) - - - org.apache.htrace - htrace-zipkin - 3.1.0-incubating - runtime - - -2. Add the following to your client configuration. - - trace.span.receivers=org.apache.accumulo.tracer.ZooTraceClient,org.apache.htrace.impl.ZipkinSpanReceiver - -3. Instrument your client as in the next section. - -Your SpanReceiver may require additional properties, and if so these should likewise -be placed in the ClientConfiguration (if applicable) and Accumulo's `accumulo-site.xml`. -Two such properties for ZipkinSpanReceiver, listed with their default values, are - -```xml - - trace.span.receiver.zipkin.collector-hostname - localhost - - - trace.span.receiver.zipkin.collector-port - 9410 - -``` - -#### Instrumenting a Client - -Tracing can be used to measure a client operation, such as a scan, as -the operation traverses the distributed system. To enable tracing for -your application call - -```java -import org.apache.accumulo.core.trace.DistributedTrace; -... -DistributedTrace.enable(hostname, "myApplication"); -// do some tracing -... -DistributedTrace.disable(); -``` - -Once tracing has been enabled, a client can wrap an operation in a trace. - -```java -import org.apache.htrace.Sampler; -import org.apache.htrace.Trace; -import org.apache.htrace.TraceScope; -... -TraceScope scope = Trace.startSpan("Client Scan", Sampler.ALWAYS); -BatchScanner scanner = conn.createBatchScanner(...); -// Configure your scanner -for (Entry entry : scanner) { -} -scope.close(); -``` - -The user can create additional Spans within a Trace. - -The sampler (such as `Sampler.ALWAYS`) for the trace should only be specified with a top-level span, -and subsequent spans will be collected depending on whether that first span was sampled. -Don't forget to specify a Sampler at the top-level span -because the default Sampler only samples when part of a pre-existing trace, -which will never occur in a client that never specifies a Sampler. - -```java -TraceScope scope = Trace.startSpan("Client Update", Sampler.ALWAYS); -... -TraceScope readScope = Trace.startSpan("Read"); -... -readScope.close(); -... -TraceScope writeScope = Trace.startSpan("Write"); -... -writeScope.close(); -scope.close(); -``` - -Like Dapper, Accumulo tracing supports user defined annotations to associate additional data with a Trace. -Checking whether currently tracing is necessary when using a sampler other than Sampler.ALWAYS. - -```java -... -int numberOfEntriesRead = 0; -TraceScope readScope = Trace.startSpan("Read"); -// Do the read, update the counter -... -if (Trace.isTracing) - readScope.getSpan().addKVAnnotation("Number of Entries Read".getBytes(StandardCharsets.UTF_8), - String.valueOf(numberOfEntriesRead).getBytes(StandardCharsets.UTF_8)); -``` - -It is also possible to add timeline annotations to your spans. -This associates a string with a given timestamp between the start and stop times for a span. - -```java -... -writeScope.getSpan().addTimelineAnnotation("Initiating Flush"); -``` - -Some client operations may have a high volume within your -application. As such, you may wish to only sample a percentage of -operations for tracing. As seen below, the CountSampler can be used to -help enable tracing for 1-in-1000 operations - -```java -import org.apache.htrace.impl.CountSampler; -... -Sampler sampler = new CountSampler(HTraceConfiguration.fromMap( - Collections.singletonMap(CountSampler.SAMPLER_FREQUENCY_CONF_KEY, "1000"))); -... -TraceScope readScope = Trace.startSpan("Read", sampler); -... -readScope.close(); -``` - -Remember to close all spans and disable tracing when finished. - -```java -DistributedTrace.disable(); -``` - -### Viewing Collected Traces - -To view collected traces, use the "Recent Traces" link on the Monitor -UI. You can also programmatically access and print traces using the -`TraceDump` class. - -#### Trace Table Format - -This section is for developers looking to use data recorded in the trace table -directly, above and beyond the default services of the Accumulo monitor. -Please note the trace table format and its supporting classes -are not in the public API and may be subject to change in future versions. - -Each span received by a tracer's ZooTraceClient is recorded in the trace table -in the form of three entries: span entries, index entries, and start time entries. -Span and start time entries record full span information, -whereas index entries provide indexing into span information -useful for quickly finding spans by type or start time. - -Each entry is illustrated by a description and sample of data. -In the description, a token in quotes is a String literal, -whereas other other tokens are span variables. -Parentheses group parts together, to distinguish colon characters inside the -column family or qualifier from the colon that separates column family and qualifier. -We use the format `row columnFamily:columnQualifier columnVisibility value` -(omitting timestamp which records the time an entry is written to the trace table). - -Span entries take the following form: - - traceId "span":(parentSpanId:spanId) [] spanBinaryEncoding - 63b318de80de96d1 span:4b8f66077df89de1:3778c6739afe4e1 [] %18;%09;... - -The parentSpanId is "" for the root span of a trace. -The spanBinaryEncoding is a compact Apache Thrift encoding of the original Span object. -This allows clients (and the Accumulo monitor) to recover all the details of the original Span -at a later time, by scanning the trace table and decoding the value of span entries -via `TraceFormatter.getRemoteSpan(entry)`. - -The trace table has a formatter class by default (org.apache.accumulo.tracer.TraceFormatter) -that changes how span entries appear from the Accumulo shell. -Normal scans to the trace table do not use this formatter representation; -it exists only to make span entries easier to view inside the Accumulo shell. - -Index entries take the following form: - - "idx":service:startTime description:sender [] traceId:elapsedTime - idx:tserver:14f3828f58b startScan:localhost [] 63b318de80de96d1:1 - -The service and sender are set by the first call of each Accumulo process -(and instrumented client processes) to `DistributedTrace.enable(...)` -(the sender is autodetected if not specified). -The description is specified in each span. -Start time and the elapsed time (start - stop, 1 millisecond in the example above) -are recorded in milliseconds as long values serialized to a string in hex. - -Start time entries take the following form: - - "start":startTime "id":traceId [] spanBinaryEncoding - start:14f3828a351 id:63b318de80de96d1 [] %18;%09;... - -The following classes may be run while Accumulo is running to provide insight into trace statistics. These require -accumulo-trace-VERSION.jar to be provided on the Accumulo classpath (`lib/ext` is fine). - - $ accumulo org.apache.accumulo.tracer.TraceTableStats -u username -p password -i instancename - $ accumulo org.apache.accumulo.tracer.TraceDump -u username -p password -i instancename -r - -### Tracing from the Shell -You can enable tracing for operations run from the shell by using the -`trace on` and `trace off` commands. - -``` -root@test test> trace on - -root@test test> scan -a b:c [] d - -root@test test> trace off -Waiting for trace information -Waiting for trace information -Trace started at 2013/08/26 13:24:08.332 -Time Start Service@Location Name - 3628+0 shell@localhost shell:root - 8+1690 shell@localhost scan - 7+1691 shell@localhost scan:location - 6+1692 tserver@localhost startScan - 5+1692 tserver@localhost tablet read ahead 6 -``` - -## Logging - -Accumulo processes each write to a set of log files. By default, these logs are found at directory -set by `ACCUMULO_LOG_DIR` in `accumulo-env.sh`. - -## Recovery - -In the event of TabletServer failure or error on shutting Accumulo down, some -mutations may not have been minor compacted to HDFS properly. In this case, -Accumulo will automatically reapply such mutations from the write-ahead log -either when the tablets from the failed server are reassigned by the Master (in the -case of a single TabletServer failure) or the next time Accumulo starts (in the event of -failure during shutdown). - -Recovery is performed by asking a tablet server to sort the logs so that tablets can easily find their missing -updates. The sort status of each file is displayed on -Accumulo monitor status page. Once the recovery is complete any -tablets involved should return to an ``online'' state. Until then those tablets will be -unavailable to clients. - -The Accumulo client library is configured to retry failed mutations and in many -cases clients will be able to continue processing after the recovery process without -throwing an exception. - -## Migrating Accumulo from non-HA Namenode to HA Namenode - -The following steps will allow a non-HA instance to be migrated to an HA instance. Consider an HDFS URL -`hdfs://namenode.example.com:8020` which is going to be moved to `hdfs://nameservice1`. - -Before moving HDFS over to the HA namenode, use `accumulo admin volumes` to confirm -that the only volume displayed is the volume from the current namenode's HDFS URL. - - Listing volumes referenced in zookeeper - Volume : hdfs://namenode.example.com:8020/accumulo - - Listing volumes referenced in accumulo.root tablets section - Volume : hdfs://namenode.example.com:8020/accumulo - Listing volumes referenced in accumulo.root deletes section (volume replacement occurrs at deletion time) - - Listing volumes referenced in accumulo.metadata tablets section - Volume : hdfs://namenode.example.com:8020/accumulo - - Listing volumes referenced in accumulo.metadata deletes section (volume replacement occurrs at deletion time) - -After verifying the current volume is correct, shut down the cluster and transition HDFS to the HA nameservice. - -Edit `accumulo-site.xml` to notify accumulo that a volume is being replaced. First, -add the new nameservice volume to the `instance.volumes` property. Next, add the -`instance.volumes.replacements` property in the form of `old new`. It's important to not include -the volume that's being replaced in `instance.volumes`, otherwise it's possible accumulo could continue -to write to the volume. - -```xml - - - instance.volumes - hdfs://nameservice1/accumulo - - - instance.volumes.replacements - hdfs://namenode.example.com:8020/accumulo hdfs://nameservice1/accumulo - -``` - -Run `accumulo init --add-volumes` and start up the accumulo cluster. Verify that the -new nameservice volume shows up with `accumulo admin volumes`. - - Listing volumes referenced in zookeeper - Volume : hdfs://namenode.example.com:8020/accumulo - Volume : hdfs://nameservice1/accumulo - - Listing volumes referenced in accumulo.root tablets section - Volume : hdfs://namenode.example.com:8020/accumulo - Volume : hdfs://nameservice1/accumulo - Listing volumes referenced in accumulo.root deletes section (volume replacement occurrs at deletion time) - - Listing volumes referenced in accumulo.metadata tablets section - Volume : hdfs://namenode.example.com:8020/accumulo - Volume : hdfs://nameservice1/accumulo - Listing volumes referenced in accumulo.metadata deletes section (volume replacement occurrs at deletion time) - -Some erroneous GarbageCollector messages may still be seen for a small period while data is transitioning to -the new volumes. This is expected and can usually be ignored. - -## Achieving Stability in a VM Environment - -For testing, demonstration, and even operation uses, Accumulo is often -installed and run in a virtual machine (VM) environment. The majority of -long-term operational uses of Accumulo are on bare-metal cluster. However, the -core design of Accumulo and its dependencies do not preclude running stably for -long periods within a VM. Many of Accumulo’s operational robustness features to -handle failures like periodic network partitioning in a large cluster carry -over well to VM environments. This guide covers general recommendations for -maximizing stability in a VM environment, including some of the common failure -modes that are more common when running in VMs. - -### Known failure modes: Setup and Troubleshooting - -In addition to the general failure modes of running Accumulo, VMs can introduce a -couple of environmental challenges that can affect process stability. Clock -drift is something that is more common in VMs, especially when VMs are -suspended and resumed. Clock drift can cause Accumulo servers to assume that -they have lost connectivity to the other Accumulo processes and/or lose their -locks in Zookeeper. VM environments also frequently have constrained resources, -such as CPU, RAM, network, and disk throughput and capacity. Accumulo generally -deals well with constrained resources from a stability perspective (optimizing -performance will require additional tuning, which is not covered in this -section), however there are some limits. - -#### Physical Memory - -One of those limits has to do with the Linux out of memory killer. A common -failure mode in VM environments (and in some bare metal installations) is when -the Linux out of memory killer decides to kill processes in order to avoid a -kernel panic when provisioning a memory page. This often happens in VMs due to -the large number of processes that must run in a small memory footprint. In -addition to the Linux core processes, a single-node Accumulo setup requires a -Hadoop Namenode, a Hadoop Secondary Namenode a Hadoop Datanode, a Zookeeper -server, an Accumulo Master, an Accumulo GC and an Accumulo TabletServer. -Typical setups also include an Accumulo Monitor, an Accumulo Tracer, a Hadoop -ResourceManager, a Hadoop NodeManager, provisioning software, and client -applications. Between all of these processes, it is not uncommon to -over-subscribe the available RAM in a VM. We recommend setting up VMs without -swap enabled, so rather than performance grinding to a halt when physical -memory is exhausted the kernel will randomly* select processes to kill in order -to free up memory. - -Calculating the maximum possible memory usage is essential in creating a stable -Accumulo VM setup. Safely engineering memory allocation for stability is a -matter of then bringing the calculated maximum memory usage under the physical -memory by a healthy margin. The margin is to account for operating system-level -operations, such as managing process, maintaining virtual memory pages, and -file system caching. When the java out-of-memory killer finds your process, you -will probably only see evidence of that in /var/log/messages. Out-of-memory -process kills do not show up in Accumulo or Hadoop logs. - -To calculate the max memory usage of all java virtual machine (JVM) processes -add the maximum heap size (often limited by a -Xmx... argument, such as in -accumulo-site.xml) and the off-heap memory usage. Off-heap memory usage -includes the following: - -* "Permanent Space", where the JVM stores Classes, Methods, and other code elements. This can be limited by a JVM flag such as `-XX:MaxPermSize:100m`, and is typically tens of megabytes. -* Code generation space, where the JVM stores just-in-time compiled code. This is typically small enough to ignore -* Socket buffers, where the JVM stores send and receive buffers for each socket. -* Thread stacks, where the JVM allocates memory to manage each thread. -* Direct memory space and JNI code, where applications can allocate memory outside of the JVM-managed space. For Accumulo, this includes the native in-memory maps that are allocated with the memory.maps.max parameter in accumulo-site.xml. -* Garbage collection space, where the JVM stores information used for garbage collection. - -You can assume that each Hadoop and Accumulo process will use ~100-150MB for -Off-heap memory, plus the in-memory map of the Accumulo TServer process. A -simple calculation for physical memory requirements follows: - -``` - Physical memory needed - = (per-process off-heap memory) + (heap memory) + (other processes) + (margin) - = (number of java processes * 150M + native map) + (sum of -Xmx settings for java process) + (total applications memory, provisioning memory, etc.) + (1G) - = (11*150M +500M) + (1G +1G +1G +256M +1G +256M +512M +512M +512M +512M +512M) + (2G) + (1G) - = (2150M) + (7G) + (2G) + (1G) - = ~12GB -``` - -These calculations can add up quickly with the large number of processes, -especially in constrained VM environments. To reduce the physical memory -requirements, it is a good idea to reduce maximum heap limits and turn off -unnecessary processes. If you're not using YARN in your application, you can -turn off the ResourceManager and NodeManager. If you're not expecting to -re-provision the cluster frequently you can turn off or reduce provisioning -processes such as Salt Stack minions and masters. - -#### Disk Space - -Disk space is primarily used for two operations: storing data and storing logs. -While Accumulo generally stores all of its key/value data in HDFS, Accumulo, -Hadoop, and Zookeeper all store a significant amount of logs in a directory on -a local file system. Care should be taken to make sure that (a) limitations to -the amount of logs generated are in place, and (b) enough space is available to -host the generated logs on the partitions that they are assigned. When space is -not available to log, processes will hang. This can cause interruptions in -availability of Accumulo, as well as cascade into failures of various -processes. - -Hadoop, Accumulo, and Zookeeper use log4j as a logging mechanism, and each of -them has a way of limiting the logs and directing them to a particular -directory. Logs are generated independently for each process, so when -considering the total space you need to add up the maximum logs generated by -each process. Typically, a rolling log setup in which each process can generate -something like 10 100MB files is instituted, resulting in a maximum file system -usage of 1GB per process. Default setups for Hadoop and Zookeeper are often -unbounded, so it is important to set these limits in the logging configuration -files for each subsystem. Consult the user manual for each system for -instructions on how to limit generated logs. - -#### Zookeeper Interaction - -Accumulo is designed to scale up to thousands of nodes. At that scale, -intermittent interruptions in network service and other rare failures of -compute nodes become more common. To limit the impact of node failures on -overall service availability, Accumulo uses a heartbeat monitoring system that -leverages Zookeeper's ephemeral locks. There are several conditions that can -occur that cause Accumulo process to lose their Zookeeper locks, some of which -are true interruptions to availability and some of which are false positives. -Several of these conditions become more common in VM environments, where they -can be exacerbated by resource constraints and clock drift. - -#### Tested Versions - -Each release of Accumulo is built with a specific version of Apache -Hadoop, Apache ZooKeeper and Apache Thrift. We expect Accumulo to -work with versions that are API compatible with those versions. -However this compatibility is not guaranteed because Hadoop, ZooKeeper -and Thift may not provide guarantees between their own versions. We -have also found that certain versions of Accumulo and Hadoop included -bugs that greatly affected overall stability. Thrift is particularly -prone to compatibility changes between versions and you must use the -same version your Accumulo is built with. - -Please check the release notes for your Accumulo version or use the -mailing lists at https://accumulo.apache.org for more info. - -[tracing]: {{page.docs_baseurl}}/administration/overview#tracing -[monitor]: {{page.docs_baseurl}}/administration/overview#monitoring -[config-mgmt]: {{page.docs_baseurl}}/administration/configuration-management -[config-props]: {{page.docs_baseurl}}/administration/configuration-properties http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/036341f2/_docs-unreleased/administration/replication.md ---------------------------------------------------------------------- diff --git a/_docs-unreleased/administration/replication.md b/_docs-unreleased/administration/replication.md index edfa1f8..bff89aa 100644 --- a/_docs-unreleased/administration/replication.md +++ b/_docs-unreleased/administration/replication.md @@ -1,7 +1,7 @@ --- title: Replication category: administration -order: 5 +order: 10 --- ## Overview http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/036341f2/_docs-unreleased/administration/tracing.md ---------------------------------------------------------------------- diff --git a/_docs-unreleased/administration/tracing.md b/_docs-unreleased/administration/tracing.md new file mode 100644 index 0000000..bce8d0b --- /dev/null +++ b/_docs-unreleased/administration/tracing.md @@ -0,0 +1,347 @@ +--- +title: Tracing +category: administration +order: 5 +--- + +It can be difficult to determine why some operations are taking longer +than expected. For example, you may be looking up items with very low +latency, but sometimes the lookups take much longer. Determining the +cause of the delay is difficult because the system is distributed, and +the typical lookup is fast. + +Accumulo has been instrumented to record the time that various +operations take when tracing is turned on. The fact that tracing is +enabled follows all the requests made on behalf of the user throughout +the distributed infrastructure of accumulo, and across all threads of +execution. + +These time spans will be inserted into the `trace` table in +Accumulo. You can browse recent traces from the Accumulo monitor +page. You can also read the `trace` table directly like any +other table. + +The design of Accumulo's distributed tracing follows that of [Google's Dapper](http://research.google.com/pubs/pub36356.html). + +## Tracers + +To collect traces, Accumulo needs at least one tracer server running. If you are using `accumulo-cluster` to start your cluster, +configure your server in `conf/tracers`. The server collects traces from clients and writes them to the `trace` table. The Accumulo +user that the tracer connects to Accumulo with can be configured with the following properties (see the [configuration management][config-mgmt] +page for setting Accumulo server properties) + + trace.user + trace.token.property.password + +Other tracer configuration properties include + + trace.port.client - port tracer listens on + trace.table - table tracer writes to + trace.zookeeper.path - zookeeper path where tracers register + +The zookeeper path is configured to /tracers by default. If +multiple Accumulo instances are sharing the same ZooKeeper +quorum, take care to configure Accumulo with unique values for +this property. + +## Configuring Tracing + +Traces are collected via SpanReceivers. The default SpanReceiver +configured is org.apache.accumulo.core.trace.ZooTraceClient, which +sends spans to an Accumulo Tracer process, as discussed in the +previous section. This default can be changed to a different span +receiver, or additional span receivers can be added in a +comma-separated list, by modifying the property + + trace.span.receivers + +Individual span receivers may require their own configuration +parameters, which are grouped under the trace.span.receiver.* +prefix. ZooTraceClient uses the following properties. The first +three properties are populated from other Accumulo properties, +while the remaining ones should be prefixed with +trace.span.receiver. when set in the Accumulo configuration. + + tracer.zookeeper.host - populated from instance.zookeepers + tracer.zookeeper.timeout - populated from instance.zookeeper.timeout + tracer.zookeeper.path - populated from trace.zookeeper.path + tracer.send.timer.millis - timer for flushing send queue (in ms, default 1000) + tracer.queue.size - max queue size (default 5000) + tracer.span.min.ms - minimum span length to store (in ms, default 1) + +Note that to configure an Accumulo client for tracing, including +the Accumulo shell, the client configuration must be given the same +trace.span.receivers, trace.span.receiver.*, and trace.zookeeper.path +properties as the servers have. + +Hadoop can also be configured to send traces to Accumulo, as of +Hadoop 2.6.0, by setting properties in Hadoop's core-site.xml +file. Instead of using the trace.span.receiver.* prefix, Hadoop +uses hadoop.htrace.*. The Hadoop configuration does not have +access to Accumulo's properties, so the +hadoop.htrace.tracer.zookeeper.host property must be specified. +The zookeeper timeout defaults to 30000 (30 seconds), and the +zookeeper path defaults to /tracers. An example of configuring +Hadoop to send traces to ZooTraceClient is + +```xml + + hadoop.htrace.spanreceiver.classes + org.apache.accumulo.core.trace.ZooTraceClient + + + hadoop.htrace.tracer.zookeeper.host + zookeeperHost:2181 + + + hadoop.htrace.tracer.zookeeper.path + /tracers + + + hadoop.htrace.tracer.span.min.ms + 1 + +``` + +The accumulo-core, accumulo-tracer, accumulo-fate and libthrift +jars must also be placed on Hadoop's classpath. + +### Adding additional SpanReceivers + +[Zipkin](https://github.com/openzipkin/zipkin) has a SpanReceiver supported by HTrace and popularized by Twitter +that users looking for a more graphical trace display may opt to use. +The following steps configure Accumulo to use `org.apache.htrace.impl.ZipkinSpanReceiver` +in addition to the Accumulo's default ZooTraceClient, and they serve as a template +for adding any SpanReceiver to Accumulo: + +1. Add the Jar containing the ZipkinSpanReceiver class file to the +`lib/` directory. It is critical that the Jar is placed in +`lib/` and NOT in `lib/ext/` so that the new SpanReceiver class +is visible to the same class loader of htrace-core. + +2. Add the following to `accumulo-site.xml`: + + + trace.span.receivers + org.apache.accumulo.tracer.ZooTraceClient,org.apache.htrace.impl.ZipkinSpanReceiver + + +3. Restart your Accumulo tablet servers. + +In order to use ZipkinSpanReceiver from a client as well as the Accumulo server, + +1. Ensure your client can see the ZipkinSpanReceiver class at runtime. For Maven projects, +this is easily done by adding to your client's pom.xml (taking care to specify a good version) + + + org.apache.htrace + htrace-zipkin + 3.1.0-incubating + runtime + + +2. Add the following to your client configuration. + + trace.span.receivers=org.apache.accumulo.tracer.ZooTraceClient,org.apache.htrace.impl.ZipkinSpanReceiver + +3. Instrument your client as in the next section. + +Your SpanReceiver may require additional properties, and if so these should likewise +be placed in the ClientConfiguration (if applicable) and Accumulo's `accumulo-site.xml`. +Two such properties for ZipkinSpanReceiver, listed with their default values, are + +```xml + + trace.span.receiver.zipkin.collector-hostname + localhost + + + trace.span.receiver.zipkin.collector-port + 9410 + +``` + +### Instrumenting a Client + +Tracing can be used to measure a client operation, such as a scan, as +the operation traverses the distributed system. To enable tracing for +your application call + +```java +import org.apache.accumulo.core.trace.DistributedTrace; +... +DistributedTrace.enable(hostname, "myApplication"); +// do some tracing +... +DistributedTrace.disable(); +``` + +Once tracing has been enabled, a client can wrap an operation in a trace. + +```java +import org.apache.htrace.Sampler; +import org.apache.htrace.Trace; +import org.apache.htrace.TraceScope; +... +TraceScope scope = Trace.startSpan("Client Scan", Sampler.ALWAYS); +BatchScanner scanner = conn.createBatchScanner(...); +// Configure your scanner +for (Entry entry : scanner) { +} +scope.close(); +``` + +The user can create additional Spans within a Trace. + +The sampler (such as `Sampler.ALWAYS`) for the trace should only be specified with a top-level span, +and subsequent spans will be collected depending on whether that first span was sampled. +Don't forget to specify a Sampler at the top-level span +because the default Sampler only samples when part of a pre-existing trace, +which will never occur in a client that never specifies a Sampler. + +```java +TraceScope scope = Trace.startSpan("Client Update", Sampler.ALWAYS); +... +TraceScope readScope = Trace.startSpan("Read"); +... +readScope.close(); +... +TraceScope writeScope = Trace.startSpan("Write"); +... +writeScope.close(); +scope.close(); +``` + +Like Dapper, Accumulo tracing supports user defined annotations to associate additional data with a Trace. +Checking whether currently tracing is necessary when using a sampler other than Sampler.ALWAYS. + +```java +... +int numberOfEntriesRead = 0; +TraceScope readScope = Trace.startSpan("Read"); +// Do the read, update the counter +... +if (Trace.isTracing) + readScope.getSpan().addKVAnnotation("Number of Entries Read".getBytes(StandardCharsets.UTF_8), + String.valueOf(numberOfEntriesRead).getBytes(StandardCharsets.UTF_8)); +``` + +It is also possible to add timeline annotations to your spans. +This associates a string with a given timestamp between the start and stop times for a span. + +```java +... +writeScope.getSpan().addTimelineAnnotation("Initiating Flush"); +``` + +Some client operations may have a high volume within your +application. As such, you may wish to only sample a percentage of +operations for tracing. As seen below, the CountSampler can be used to +help enable tracing for 1-in-1000 operations + +```java +import org.apache.htrace.impl.CountSampler; +... +Sampler sampler = new CountSampler(HTraceConfiguration.fromMap( + Collections.singletonMap(CountSampler.SAMPLER_FREQUENCY_CONF_KEY, "1000"))); +... +TraceScope readScope = Trace.startSpan("Read", sampler); +... +readScope.close(); +``` + +Remember to close all spans and disable tracing when finished. + +```java +DistributedTrace.disable(); +``` + +## Viewing Collected Traces + +To view collected traces, use the "Recent Traces" link on the Monitor +UI. You can also programmatically access and print traces using the +`TraceDump` class. + +### Trace Table Format + +This section is for developers looking to use data recorded in the trace table +directly, above and beyond the default services of the Accumulo monitor. +Please note the trace table format and its supporting classes +are not in the public API and may be subject to change in future versions. + +Each span received by a tracer's ZooTraceClient is recorded in the trace table +in the form of three entries: span entries, index entries, and start time entries. +Span and start time entries record full span information, +whereas index entries provide indexing into span information +useful for quickly finding spans by type or start time. + +Each entry is illustrated by a description and sample of data. +In the description, a token in quotes is a String literal, +whereas other other tokens are span variables. +Parentheses group parts together, to distinguish colon characters inside the +column family or qualifier from the colon that separates column family and qualifier. +We use the format `row columnFamily:columnQualifier columnVisibility value` +(omitting timestamp which records the time an entry is written to the trace table). + +Span entries take the following form: + + traceId "span":(parentSpanId:spanId) [] spanBinaryEncoding + 63b318de80de96d1 span:4b8f66077df89de1:3778c6739afe4e1 [] %18;%09;... + +The parentSpanId is "" for the root span of a trace. +The spanBinaryEncoding is a compact Apache Thrift encoding of the original Span object. +This allows clients (and the Accumulo monitor) to recover all the details of the original Span +at a later time, by scanning the trace table and decoding the value of span entries +via `TraceFormatter.getRemoteSpan(entry)`. + +The trace table has a formatter class by default (org.apache.accumulo.tracer.TraceFormatter) +that changes how span entries appear from the Accumulo shell. +Normal scans to the trace table do not use this formatter representation; +it exists only to make span entries easier to view inside the Accumulo shell. + +Index entries take the following form: + + "idx":service:startTime description:sender [] traceId:elapsedTime + idx:tserver:14f3828f58b startScan:localhost [] 63b318de80de96d1:1 + +The service and sender are set by the first call of each Accumulo process +(and instrumented client processes) to `DistributedTrace.enable(...)` +(the sender is autodetected if not specified). +The description is specified in each span. +Start time and the elapsed time (start - stop, 1 millisecond in the example above) +are recorded in milliseconds as long values serialized to a string in hex. + +Start time entries take the following form: + + "start":startTime "id":traceId [] spanBinaryEncoding + start:14f3828a351 id:63b318de80de96d1 [] %18;%09;... + +The following classes may be run while Accumulo is running to provide insight into trace statistics. These require +accumulo-trace-VERSION.jar to be provided on the Accumulo classpath (`lib/ext` is fine). + + $ accumulo org.apache.accumulo.tracer.TraceTableStats -u username -p password -i instancename + $ accumulo org.apache.accumulo.tracer.TraceDump -u username -p password -i instancename -r + +### Tracing from the Shell +You can enable tracing for operations run from the shell by using the +`trace on` and `trace off` commands. + +``` +root@test test> trace on + +root@test test> scan +a b:c [] d + +root@test test> trace off +Waiting for trace information +Waiting for trace information +Trace started at 2013/08/26 13:24:08.332 +Time Start Service@Location Name + 3628+0 shell@localhost shell:root + 8+1690 shell@localhost scan + 7+1691 shell@localhost scan:location + 6+1692 tserver@localhost startScan + 5+1692 tserver@localhost tablet read ahead 6 +``` + +[config-mgmt]: {{page.docs_baseurl}}/administration/configuration-management