accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From els...@apache.org
Subject [1/7] git commit: ACCUMULO-1217 Add documentation about start-all.sh and start-here.sh to recover from process failure.
Date Tue, 25 Mar 2014 00:37:27 GMT
Repository: accumulo
Updated Branches:
  refs/heads/1.6.0-SNAPSHOT 62ce7524f -> 05254f388
  refs/heads/master a9f1767b9 -> 139a192ee


ACCUMULO-1217 Add documentation about start-all.sh and start-here.sh to recover from process
failure.


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/3e749fb2
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/3e749fb2
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/3e749fb2

Branch: refs/heads/1.6.0-SNAPSHOT
Commit: 3e749fb2cc05a4fdae9753d97ffa99bff5aeb065
Parents: 62ce752
Author: Josh Elser <elserj@apache.org>
Authored: Mon Mar 24 17:26:08 2014 -0700
Committer: Josh Elser <elserj@apache.org>
Committed: Mon Mar 24 17:26:08 2014 -0700

----------------------------------------------------------------------
 .../chapters/troubleshooting.tex                | 41 ++++++++++++++++++++
 1 file changed, 41 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/3e749fb2/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex b/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex
index 18d472f..3e7572d 100644
--- a/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex
+++ b/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex
@@ -518,6 +518,47 @@ Besides these columns, you may see:
 
 \end{enumerate}
 
+\section{Simple System Recovery}
+
+Q. One of my Accumulo processes died. How do I bring it back?
+
+The easiest way to bring all services online for an Accumulo instance is to run the ``start-all.sh``
script.
+
+\small
+\begin{verbatim}
+  $ bin/start-all.sh
+\end{verbatim}
+\normalsize
+
+This process will check the process listing, using ``jps`` on each host before attempting
to restart a service on the given host.
+Typically, this check is sufficient except in the face of a hung/zombie process. For large
clusters, it may be
+undesirable to ssh to every node in the cluster to ensure that all hosts are running the
appropriate processes and ``start-here.sh`` may be of use.
+
+\small
+\begin{verbatim}
+  $ ssh host_with_dead_process
+  $ bin/start-here.sh
+\end{verbatim}
+\normalsize
+
+``start-here.sh`` should be invoked on the host which is missing a given process. Like start-all.sh,
it will start all
+necessary processes that are not currently running, but only on the current host and not
cluster-wide. Tools such as ``pssh`` or 
+``pdsh`` can be used to automate this process.
+
+``start-server.sh`` can also be used to start a process on a given host; however, it is not
generally recommended for
+users to issue this directly as the ``start-all.sh`` and ``start-here.sh`` scripts provide
the same functionality with
+more automation and are less prone to user error.
+
+A. Use ``start-all.sh`` or ``start-here.sh``.
+
+Q. My process died again. Should I restart it via ``cron`` or tools like ``supervisord``?
+
+A. A repeatedly dying Accumulo process is a sign of a larger problem. Typically these problems
are due to a
+misconfiguration of Accumulo or over-saturation of resources. Blind automation of any service
restart inside of Accumulo
+is generally an undesirable situation as it is indicative of a problem that is being masked
and ignored. Accumulo
+processes should be stable on the order of months and not require frequent restart.
+
+
 \section{Advanced System Recovery}
 
 Q. I had disasterous HDFS failure.  After bringing everything back up, several tablets refuse
to go online.


Mime
View raw message