Return-Path: X-Original-To: apmail-accumulo-commits-archive@www.apache.org Delivered-To: apmail-accumulo-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1DBC2172D2 for ; Tue, 7 Apr 2015 15:49:51 +0000 (UTC) Received: (qmail 83718 invoked by uid 500); 7 Apr 2015 15:49:51 -0000 Delivered-To: apmail-accumulo-commits-archive@accumulo.apache.org Received: (qmail 83683 invoked by uid 500); 7 Apr 2015 15:49:51 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 83646 invoked by uid 99); 7 Apr 2015 15:49:51 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Apr 2015 15:49:50 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id D6055E10CB; Tue, 7 Apr 2015 15:49:50 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: afuchs@apache.org To: commits@accumulo.apache.org Message-Id: <415b6e4cb68a4aa0af390fc62c651289@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: accumulo git commit: ACCUMULO-3712 Added a user manual section discussion achieving stability in a VM environment Date: Tue, 7 Apr 2015 15:49:50 +0000 (UTC) Repository: accumulo Updated Branches: refs/heads/master b2aa0f86e -> 0a5a5682d ACCUMULO-3712 Added a user manual section discussion achieving stability in a VM environment Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/0a5a5682 Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/0a5a5682 Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/0a5a5682 Branch: refs/heads/master Commit: 0a5a5682d524887c2dc1be0cceffe8371db29c8a Parents: b2aa0f8 Author: Adam Fuchs Authored: Tue Apr 7 11:49:07 2015 -0400 Committer: Adam Fuchs Committed: Tue Apr 7 11:49:07 2015 -0400 ---------------------------------------------------------------------- .../main/asciidoc/chapters/administration.txt | 179 +++++++++++++++++++ 1 file changed, 179 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo/blob/0a5a5682/docs/src/main/asciidoc/chapters/administration.txt ---------------------------------------------------------------------- diff --git a/docs/src/main/asciidoc/chapters/administration.txt b/docs/src/main/asciidoc/chapters/administration.txt index 275ed0e..d7a565b 100644 --- a/docs/src/main/asciidoc/chapters/administration.txt +++ b/docs/src/main/asciidoc/chapters/administration.txt @@ -646,6 +646,32 @@ Time Start Service@Location Name Accumulo processes each write to a set of log files. By default these are found under +$ACCUMULO/logs/+. +[[watcher]] +=== Watcher +Accumulo includes scripts to automatically restart server processes in the case +of intermittent failures. To enable this watcher, edit +conf/accumulo-env.sh+ +to include the following: + +.... +# Should process be automatically restarted +export ACCUMULO_WATCHER="true" + +# What settings should we use for the watcher, if enabled +export UNEXPECTED_TIMESPAN="3600" +export UNEXPECTED_RETRIES="2" + +export OOM_TIMESPAN="3600" +export OOM_RETRIES="5" + +export ZKLOCK_TIMESPAN="600" +export ZKLOCK_RETRIES="5" +.... + +When an Accumulo process dies, the watcher will look at the logs and exit codes +to determine how the process failed and either restart or fail depending on the +recent history of failures. The restarting policy for various failure conditions +is configurable through the +*_TIMESPAN+ and +*_RETRIES+ variables shown above. + === Recovery In the event of TabletServer failure or error on shutting Accumulo down, some @@ -726,3 +752,156 @@ Some erroneous GarbageCollector messages may still be seen for a small period wh the new volumes. This is expected and can usually be ignored. + + + + + + + +=== Achieving Stability in a VM Environment + +For testing, demonstration, and even operation uses, Accumulo is often +installed and run in a virtual machine (VM) environment. The majority of +long-term operational uses of Accumulo are on bare-metal cluster. However, the +core design of Accumulo and its dependencies do not preclude running stably for +long periods within a VM. Many of Accumulo’s operational robustness features to +handle failures like periodic network partitioning in a large cluster carry +over well to VM environments. This guide covers general recommendations for +maximizing stability in a VM environment, including some of the common failure +modes that are more common when running in VMs. + +==== Known failure modes: Setup and Troubleshooting +In addition to the general failure modes of running Sqrrl, VMs can introduce a +couple of environmental challenges that can affect process stability. Clock +drift is something that is more common in VMs, especially when VMs are +suspended and resumed. Clock drift can cause Accumulo servers to assume that +they have lost connectivity to the other Accumulo processes and/or lose their +locks in Zookeeper. VM environments also frequently have constrained resources, +such as CPU, RAM, network, and disk throughput and capacity. Accumulo generally +deals well with constrained resources from a stability perspective (optimizing +performance will require additional tuning, which is not covered in this +section), however there are some limits. + +===== Physical Memory +One of those limits has to do with the Linux out of memory killer. A common +failure mode in VM environments (and in some bare metal installations) is when +the Linux out of memory killer decides to kill processes in order to avoid a +kernel panic when provisioning a memory page. This often happens in VMs due to +the large number of processes that must run in a small memory footprint. In +addition to the Linux core processes, a single-node Accumulo setup requires a +Hadoop Namenode, a Hadoop Secondary Namenode a Hadoop Datanode, a Zookeeper +server, an Accumulo Master, an Accumulo GC and an Accumulo TabletServer. +Typical setups also include an Accumulo Monitor, an Accumulo Tracer, a Hadoop +ResourceManager, a Hadoop NodeManager, provisioning software, and client +applications. Between all of these processes, it is not uncommon to +over-subscribe the available RAM in a VM. We recommend setting up VMs without +swap enabled, so rather than performance grinding to a halt when physical +memory is exhausted the kernel will randomly* select processes to kill in order +to free up memory. + +Calculating the maximum possible memory usage is essential in creating a stable +Accumulo VM setup. Safely engineering memory allocation for stability is a +matter of then bringing the calculated maximum memory usage under the physical +memory by a healthy margin. The margin is to account for operating system-level +operations, such as managing process, maintaining virtual memory pages, and +file system caching. When the java out-of-memory killer finds your process, you +will probably only see evidence of that in /var/log/messages. Out-of-memory +process kills do not show up in Accumulo or Hadoop logs. + +To calculate the max memory usage of all java virtual machine (JVM) processes +add the maximum heap size (often limited by a -Xmx... argument, such as in +accumulo-site.xml) and the off-heap memory usage. Off-heap memory usage +includes the following: + +* "Permanent Space", where the JVM stores Classes, Methods, and other code elements. This can be limited by a JVM flag such as +-XX:MaxPermSize:100m+, and is typically tens of megabytes. +* Code generation space, where the JVM stores just-in-time compiled code. This is typically small enough to ignore +* Socket buffers, where the JVM stores send and receive buffers for each socket. +* Thread stacks, where the JVM allocates memory to manage each thread. +* Direct memory space and JNI code, where applications can allocate memory outside of the JVM-managed space. For Accumulo, this includes the native in-memory maps that are allocated with the memory.maps.max parameter in accumulo-site.xml. +* Garbage collection space, where the JVM stores information used for garbage collection. + +You can assume that each Hadoop and Accumulo process will use ~100-150MB for +Off-heap memory, plus the in-memory map of the Accumulo TServer process. A +simple calculation for physical memory requirements follows: + +.... + Physical memory needed + = (per-process off-heap memory) + (heap memory) + (other processes) + (margin) + = (number of java processes * 150M + native map) + (sum of -Xmx settings for java process) + (total applications memory, provisioning memory, etc.) + (1G) + = (11*150M +500M) + (1G +1G +1G +256M +1G +256M +512M +512M +512M +512M +512M) + (2G) + (1G) + = (2150M) + (7G) + (2G) + (1G) + = ~12GB +.... + +These calculations can add up quickly with the large number of processes, +especially in constrained VM environments. To reduce the physical memory +requirements, it is a good idea to reduce maximum heap limits and turn off +unnecessary processes. If you're not using YARN in your application, you can +turn off the ResourceManager and NodeManager. If you're not expecting to +re-provision the cluster frequently you can turn off or reduce provisioning +processes such as Salt Stack minions and masters. + +===== Disk Space +Disk space is primarily used for two operations: storing data and storing logs. +While Accumulo generally stores all of its key/value data in HDFS, Accumulo, +Hadoop, and Zookeeper all store a significant amount of logs in a directory on +a local file system. Care should be taken to make sure that (a) limitations to +the amount of logs generated are in place, and (b) enough space is available to +host the generated logs on the partitions that they are assigned. When space is +not available to log, processes will hang. This can cause interruptions in +availability of Accumulo, as well as cascade into failures of various +processes. + +Hadoop, Accumulo, and Zookeeper use log4j as a logging mechanism, and each of +them has a way of limiting the logs and directing them to a particular +directory. Logs are generated independently for each process, so when +considering the total space you need to add up the maximum logs generated by +each process. Typically, a rolling log setup in which each process can generate +something like 10 100MB files is instituted, resulting in a maximum file system +usage of 1GB per process. Default setups for Hadoop and Zookeeper are often +unbounded, so it is important to set these limits in the logging configuration +files for each subsystem. Consult the user manual for each system for +instructions on how to limit generated logs. + +===== Zookeeper Interaction +Accumulo is designed to scale up to thousands of nodes. At that scale, +intermittent interruptions in network service and other rare failures of +compute nodes become more common. To limit the impact of node failures on +overall service availability, Accumulo uses a heartbeat monitoring system that +leverages Zookeeper's ephemeral locks. There are several conditions that can +occur that cause Accumulo process to lose their Zookeeper locks, some of which +are true interruptions to availability and some of which are false positives. +Several of these conditions become more common in VM environments, where they +can be exacerbated by resource constraints and clock drift. + +Accumulo includes a mechanism to limit the impact of the false positives known +as the <>. The watcher monitors Accumulo processes and will restart +them when they fail for certain reasons. The watcher can be configured within +the accumulo-env.sh file inside of Accumulo's configuration directory. We +recommend using the watcher to monitor Accumulo processes, as it will restore +the system to full capacity without administrator interaction after many of the +common failure modes. + +==== Tested Versions +Another large consideration for Accumulo stability is to use versions of +software that have been tested together in a VM environment. Any cluster of +processes that have not been tested together are likely to expose running +conditions that vary from the environments individually tested in the various +components. For example, Accumulo's use of HDFS includes many short block +reads, which differs from the more common full file read used in most +map/reduce applications. We have found that certain versions of Accumulo and +Hadoop will include stability bugs that greatly affect overall stability. In +our testing, Accumulo 1.6.2, Hadoop 2.6.0, and Zookeeper 3.4.6 resulted in a +stable VM clusters that did not fail a month of testing, while Accumulo 1.6.1, +Hadoop 2.5.1, and Zookeeper 3.4.5 had a mean time between failure of less than +a week under heavy ingest and query load. We expect that results will vary with +other configurations, and you should choose your software versions with that in +mind. + + + + + + +