hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@apache.org
Subject svn commit: r1235023 - /hbase/trunk/src/docbkx/ops_mgt.xml
Date Mon, 23 Jan 2012 22:06:03 GMT
Author: stack
Date: Mon Jan 23 22:06:03 2012
New Revision: 1235023

URL: http://svn.apache.org/viewvc?rev=1235023&view=rev
Log:
Add Riley's slow query doc from hbase-4117

Modified:
    hbase/trunk/src/docbkx/ops_mgt.xml

Modified: hbase/trunk/src/docbkx/ops_mgt.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/ops_mgt.xml?rev=1235023&r1=1235022&r2=1235023&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/ops_mgt.xml (original)
+++ hbase/trunk/src/docbkx/ops_mgt.xml Mon Jan 23 22:06:03 2012
@@ -367,6 +367,51 @@ false
     <title >HBase Monitoring</title>
     <para>TODO
     </para>
+    <section xml:id="ops.slow.query">
+    <title>Slow Query Log</title>
+<para>The HBase slow query log consists of parseable JSON structures describing the
properties of those client operations (Gets, Puts, Deletes, etc.) that either took too long
to run, or produced too much output. The thresholds for "too long to run" and "too much output"
are configurable, as described below. The output is produced inline in the main region server
logs so that it is easy to discover further details from context with other logged events.
It is also prepended with identifying tags <constant>(responseTooSlow)</constant>,
<constant>(responseTooLarge)</constant>, <constant>(operationTooSlow)</constant>,
and <constant>(operationTooLarge)</constant> in order to enable easy filtering
with grep, in case the user desires to see only slow queries.
+</para>
+
+<section><title>Configuration</title>
+<para>There are two configuration knobs that can be used to adjust the thresholds for
when queries are logged.
+</para>
+
+<itemizedlist>
+<listitem>
+<varname>hbase.ipc.warn.response.time</varname> Maximum number of milliseconds
that a query can be run without being logged. Defaults to 10000, or 10 seconds. Can be set
to -1 to disable logging by time.
+</listitem>
+<listitem><varname>hbase.ipc.warn.response.size</varname> Maximum byte
size of response that a query can return without being logged. Defaults to 100 megabytes.
Can be set to -1 to disable logging by size.
+</listitem>
+</itemizedlist>
+</section>
+
+<section><title>Metrics</title>
+<para>The slow query log exposes to metrics to JMX.
+<itemizedlist><listitem><varname>hadoop.regionserver_rpc_slowResponse</varname>
a global metric reflecting the durations of all responses that triggered logging.</listitem>
+<listitem><varname>hadoop.regionserver_rpc_methodName.aboveOneSec</varname>
A metric reflecting the durations of all responses that lasted for more than one second.</listitem>
+</itemizedlist>
+</para>
+</section>
+
+<section><title>Output</title>
+<para>The output is tagged with operation e.g. <constant>(operationTooSlow)</constant>
if the call was a client operation, such as a Put, Get, or Delete, which we expose detailed
fingerprint information for. If not, it is tagged <constant>(responseTooSlow)</constant>
and still produces parseable JSON output, but with less verbose information solely regarding
its duration and size in the RPC itself. <constant>TooLarge</constant> is substituted
for <constant>TooSlow</constant> if the response size triggered the logging, with
<constant>TooLarge</constant> appearing even in the case that both size and duration
triggered logging.
+</para>
+</section>
+<section><title>Example</title>
+<para>
+<programlisting>2011-09-08 10:01:25,824 WARN org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
{"tables":{"riley2":{"puts":[{"totalColumns":11,"families":{"actions":[{"timestamp":1315501284459,"qualifier":"0","vlen":9667580},{"timestamp":1315501284459,"qualifier":"1","vlen":10122412},{"timestamp":1315501284459,"qualifier":"2","vlen":11104617},{"timestamp":1315501284459,"qualifier":"3","vlen":13430635}]},"row":"cfcd208495d565ef66e7dff9f98764da:0"}],"families":["actions"]}},"processingtimems":956,"client":"10.47.34.63:33623","starttimems":1315501284456,"queuetimems":0,"totalPuts":1,"class":"HRegionServer","responsesize":0,"method":"multiPut"}</programlisting>
+</para>
+
+<para>Note that everything inside the "tables" structure is output produced by MultiPut's
fingerprint, while the rest of the information is RPC-specific, such as processing time and
client IP/port. Other client operations follow the same pattern and the same general structure,
with necessary differences due to the nature of the individual operations. In the case that
the call is not a client operation, that detailed fingerprint information will be completely
absent.
+</para>
+
+<para>This particular example, for example, would indicate that the likely cause of
slowness is simply a very large (on the order of 100MB) multiput, as we can tell by the "vlen,"
or value length, fields of each put in the multiPut.
+</para>
+</section>
+</section>
+
+
+
   </section>
   
   <section xml:id="cluster_replication">



Mime
View raw message