accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From e..@apache.org
Subject accumulo git commit: ACCUMULO-4091 added MutationsRejectedException discussions
Date Mon, 28 Dec 2015 16:41:35 GMT
Repository: accumulo
Updated Branches:
  refs/heads/master af040bfb4 -> 7b1e26ae2


ACCUMULO-4091 added MutationsRejectedException discussions


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/7b1e26ae
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/7b1e26ae
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/7b1e26ae

Branch: refs/heads/master
Commit: 7b1e26ae29bcb89c02aeb508864c48cb46f427fa
Parents: af040bf
Author: Eric C. Newton <eric.newton@gmail.com>
Authored: Mon Dec 28 11:41:19 2015 -0500
Committer: Eric C. Newton <eric.newton@gmail.com>
Committed: Mon Dec 28 11:41:19 2015 -0500

----------------------------------------------------------------------
 .../main/asciidoc/chapters/troubleshooting.txt  | 55 ++++++++++++++++++++
 1 file changed, 55 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/7b1e26ae/docs/src/main/asciidoc/chapters/troubleshooting.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/troubleshooting.txt b/docs/src/main/asciidoc/chapters/troubleshooting.txt
index ada0fbf..9546638 100644
--- a/docs/src/main/asciidoc/chapters/troubleshooting.txt
+++ b/docs/src/main/asciidoc/chapters/troubleshooting.txt
@@ -229,6 +229,61 @@ messages to zookeeper.
 
 *A*: Ensure the tablet server JVM is not running low on memory.
 
+*Q*: I'm seeing errors in tablet server logs that include the words "MutationsRejectedException"
and "# constraint violations: 1". Moments after that the server died.
+
+The error you are seeing is part of a failing tablet server scenario.
+This is a bit complicated, so name two of your tablet servers A and B.
+
+Tablet server A is hosting a tablet, let's call it a-tablet.
+
+Tablet server B is hosting a metadata tablet, let's call it m-tablet.
+
+m-tablet records the information about a-tablet, for example, the names of the files it is
using to store data.
+
+When A ingests some data, it eventually flushes the updates from memory to a file.
+
+Tablet server A then writes this new information to m-tablet, on Tablet server B.
+
+Here's a likely failure scenario:
+
+Tablet server A does not have enough memory for all the processes running on it.
+The operating system sees a large chunk of the tablet server being unused, and swaps it out
to disk to make room for other processes.
+Tablet server A does a java memory garbage collection, which causes it to start using all
the memory allocated to it.
+As the server starts pulling data from swap, it runs very slowly.
+It fails to send the keep-alive messages to zookeeper in a timely fashion, and it looses
its zookeeper session.
+
+But, it's running so slowly, that it takes a moment to realize it should no longer be hosting
tablets.
+
+The thread that is flushing a-tablet memory attempts to update m-tablet with the new file
information.
+
+Fortunately there's a constraint on m-tablet.
+Mutations to the metadata table must contain a valid zookeeper session.
+This prevents tablet server A from making updates to m-tablet when it no long has the right
to host the tablet.
+
+The "MutationsRejectedException" error is from tablet server A making an update to tablet
server B's m-tablet.
+It's getting a constraint violation: tablet server A has lost its zookeeper session, and
will fail momentarily.
+
+*A*: Ensure that memory is not over-allocated.  Monitor swap usage, or turn swap off.
+
+*Q*: My accumulo client is getting a MutationsRejectedException. The monitor is displaying
"No Such SessionID" errors.
+
+When your client starts sending mutations to accumulo, it creates a session. Once the session
is created,
+mutations are streamed to accumulo, without acknowledgement, against this session.  Once
the client is done,
+it will close the session, and get an acknowledgement.
+
+If the client fails to communicate with accumulo, it will release the session, assuming that
the client has died.
+If the client then attempts to send more mutations against the session, you will see "No
Such SessionID" errors on
+the server, and MutationRejectedExceptions in the client.
+
+The client library should be either actively using the connection to the tablet servers,
+or closing the connection and sessions. If the session times out, something is causing your
client
+to pause.
+
+The most frequent source of these pauses are java garbage collection pauses
+due to the JVM running out of memory, or being swapped out to disk.
+
+*A*: Ensure your client has adequate memory and is not being swapped out to disk.
+
 ### Tools
 
 The accumulo script can be used to run classes from the command line.


Mime
View raw message