From st...@apache.org
Subject git commit: HBASE-11607 Document HBase metrics (Misty Stanley-Jones)
Date Tue, 19 Aug 2014 20:51:29 GMT
Repository: hbase
Updated Branches:
  refs/heads/master 3b864842c -> 8a52d58a7

HBASE-11607 Document HBase metrics (Misty Stanley-Jones)

Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/8a52d58a
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/8a52d58a
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/8a52d58a

Branch: refs/heads/master
Commit: 8a52d58a7ba5f87a771b1b6f803381a1c0b9909b
Parents: 3b86484
Author: stack <stack@apache.org>
Authored: Tue Aug 19 13:51:17 2014 -0700
Committer: stack <stack@apache.org>
Committed: Tue Aug 19 13:51:17 2014 -0700

 src/main/docbkx/ops_mgt.xml | 300 +++++++++++++++------------------------
 1 file changed, 118 insertions(+), 182 deletions(-)

diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml
index 29f244f..ac45ecf 100644
--- a/src/main/docbkx/ops_mgt.xml
+++ b/src/main/docbkx/ops_mgt.xml
@@ -951,196 +951,132 @@ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh
--restart --
   <!--  node mgt -->
-  <section
-    xml:id="hbase_metrics">
+  <section xml:id="hbase_metrics">
     <title>HBase Metrics</title>
-    <section
-      xml:id="metric_setup">
+    <para>HBase emits metrics which adhere to the <link
+        xlink:href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/metrics/package-summary.html"
+        >Hadoop metrics</link> API. Starting with HBase 0.95, HBase is configured
to emit a default
+      set of metrics with a default sampling period of every 10 seconds. You can use HBase
+      metrics in conjunction with Ganglia. You can also filter which metrics are emitted
and extend
+      the metrics framework to capture custom metrics appropriate for your environment.</para>
+    <section xml:id="metric_setup">
       <title>Metric Setup</title>
-      <para>See <link
-          xlink:href="http://hbase.apache.org/metrics.html">Metrics</link> for an
introduction and
-        how to enable Metrics emission. Still valid for HBase 0.94.x. </para>
-      <para>For HBase 0.95.x and up, see <link
-          xlink:href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html"
+      <para>For HBase 0.95 and newer, HBase ships with a default metrics configuration,
+          <firstterm>sink</firstterm>. This includes a wide variety of individual
metrics, and emits
+        them every 10 seconds by default. To configure metrics for a given region server,
edit the
+          <filename>conf/hadoop-metrics2-hbase.properties</filename> file. Restart
the region server
+        for the changes to take effect.</para>
+      <para>To change the sampling rate for the default sink, edit the line beginning
+          <literal>*.period</literal>. To filter which metrics are emitted or
to extend the metrics
+        framework, see <link
+          xlink:href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html"
+        />
+      <note xml:id="rs_metrics_ganglia">
+        <title>HBase Metrics and Ganglia</title>
+        <para>By default, HBase emits a large number of metrics per region server.
Ganglia may have
+          difficulty processing all these metrics. Consider increasing the capacity of the
+          server or reducing the number of metrics emitted by HBase. See <link
+            xlink:href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html#filtering"
+            >Metrics Filtering</link>.</para>
+      </note>
-    <section
-      xml:id="rs_metrics_ganglia">
-      <title>Warning To Ganglia Users</title>
-      <para>Warning to Ganglia Users: by default, HBase will emit a LOT of metrics
per RegionServer
-        which may swamp your installation. Options include either increasing Ganglia server
-        capacity, or configuring HBase to emit fewer metrics. </para>
+    <section>
+      <title>Disabling Metrics</title>
+      <para>To disable metrics for a region server, edit the
+          <filename>conf/hadoop-metrics2-hbase.properties</filename> file and
comment out any
+        uncommented lines. Restart the region server for the changes to take effect.</para>
-    <section
-      xml:id="rs_metrics">
-      <title>Most Important RegionServer Metrics</title>
-      <section
-        xml:id="hbase.regionserver.blockCacheHitCachingRatio">
-        <title><varname>blockCacheExpressCachingRatio (formerly
-          blockCacheHitCachingRatio)</varname></title>
-        <para>Block cache hit caching ratio (0 to 100). The cache-hit ratio for reads
configured to
-          look in the cache (i.e., cacheBlocks=true). </para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.callQueueLength">
-        <title><varname>callQueueLength</varname></title>
-        <para>Point in time length of the RegionServer call queue. If requests arrive
faster than
-          the RegionServer handlers can process them they will back up in the callQueue.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.compactionQueueSize">
-        <title><varname>compactionQueueLength (formerly compactionQueueSize)</varname></title>
-        <para>Point in time length of the compaction queue. This is the number of Stores
in the
-          RegionServer that have been targeted for compaction.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.flushQueueSize">
-        <title><varname>flushQueueSize</varname></title>
-        <para>Point in time number of enqueued regions in the MemStore awaiting flush.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.hdfsBlocksLocalityIndex">
-        <title><varname>hdfsBlocksLocalityIndex</varname></title>
-        <para>Point in time percentage of HDFS blocks that are local to this RegionServer.
-          higher the better. </para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.memstoreSizeMB">
-        <title><varname>memstoreSizeMB</varname></title>
-        <para>Point in time sum of all the memstore sizes in this RegionServer (MB).
Watch for this
-          nearing or exceeding the configured high-watermark for MemStore memory in the
-          RegionServer. </para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.regions">
-        <title><varname>numberOfOnlineRegions</varname></title>
-        <para>Point in time number of regions served by the RegionServer. This is an
-          metric to track for RegionServer-Region density. </para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.readRequestsCount">
-        <title><varname>readRequestsCount</varname></title>
-        <para>Number of read requests for this RegionServer since startup. Note: this
is a 32-bit
-          integer and can roll. </para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.slowHLogAppendCount">
-        <title><varname>slowHLogAppendCount</varname></title>
-        <para>Number of slow HLog append writes for this RegionServer since startup,
where "slow" is
-          > 1 second. This is a good "canary" metric for HDFS. </para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.usedHeapMB">
-        <title><varname>usedHeapMB</varname></title>
-        <para>Point in time amount of memory used by the RegionServer (MB).</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.writeRequestsCount">
-        <title><varname>writeRequestsCount</varname></title>
-        <para>Number of write requests for this RegionServer since startup. Note: this
is a 32-bit
-          integer and can roll. </para>
-      </section>
+    <section>
+      <title>Discovering Available Metrics</title>
+      <para>Rather than listing each metric which HBase emits by default, you can browse
through the
+        available metrics, either as a JSON output or via JMX. At this time, the JSON output
+        not include the description field which is included in the JMX view. Different metrics
+        exposed for the Master process and each region server process.</para>
+      <procedure>
+        <title>Access a JSON Output of Available Metrics</title>
+        <step>
+          <para>After starting HBase, access the region server's web UI, at
+              <literal>http://localhost:60030</literal> by default.</para>
+        </step>
+        <step>
+          <para>Click the <guilabel>Metrics Dump</guilabel> link near the
top. The metrics for the region server are
+            presented as a dump of the JMX bean in JSON format.</para>
+        </step>
+        <step>
+          <para>To view metrics for the Master, connect to the Master's web UI instead
(defaults to
+              <literal>http://localhost:60010</literal>) and click its <guilabel>Metrics
+              Dump</guilabel> link.</para>
+        </step>
+      </procedure>
+      <procedure>
+        <title>Browse the JMX Output of Available Metrics</title>
+        <para>You can use many different tools to view JMX content by browsing MBeans.
+          procedure uses <command>jvisualvm</command>, which is an application
usually available in the JDK.
+            </para>
+        <step>
+          <para>Start HBase, if it is not already running.</para>
+        </step>
+        <step>
+          <para>Run the command <command>jvisualvm</command> command on
a host with a GUI display.
+            You can launch it from the command line or another method appropriate for your
+            system.</para>
+        </step>
+        <step>
+          <para>Be sure the <guilabel>VisualVM-MBeans</guilabel> plugin
is installed. Browse to <menuchoice>
+              <guimenu>Tools</guimenu>
+              <guimenuitem>Plugins</guimenuitem>
+            </menuchoice>. Click <guilabel>Installed</guilabel> and check
whether the plugin is
+            listed. If not, click <guilabel>Available Plugins</guilabel>, select
it, and click
+              <guibutton>Install</guibutton>. When finished, click
+            <guibutton>Close</guibutton>.</para>
+        </step>
+        <step>
+          <para>To view details for a given HBase process, double-click the process
in the
+              <guilabel>Local</guilabel> sub-tree in the left-hand panel. A detailed
view opens in
+            the right-hand panel. Click the <guilabel>MBeans</guilabel> tab which
appears as a tab
+            in the top of the right-hand panel.</para>
+        </step>
+        <step>
+          <para>To access the HBase metrics, navigate to the appropriate sub-bean:</para>
+          <itemizedlist>
+            <listitem>
+              <para>Master: <menuchoice>
+                  <guimenu>Hadoop</guimenu>
+                  <guisubmenu>HBase</guisubmenu>
+                  <guisubmenu>Master</guisubmenu>
+                  <guisubmenu>Server</guisubmenu>
+                </menuchoice></para>
+            </listitem>
+            <listitem>
+              <para>RegionServer: <menuchoice>
+                  <guimenu>Hadoop</guimenu>
+                  <guisubmenu>HBase</guisubmenu>
+                  <guisubmenu>RegionServer</guisubmenu>
+                  <guisubmenu>Server</guisubmenu>
+                </menuchoice></para>
+            </listitem>
+          </itemizedlist>
+        </step>
+        <step>
+          <para>The name of each metric and its current value is displayed in the
+              <guilabel>Attributes</guilabel> tab. For a view which includes
more details, including
+            the description of each attribute, click the <guilabel>Metadata</guilabel>
+        </step>
+      </procedure>
-    <section
-      xml:id="rs_metrics_other">
-      <title>Other RegionServer Metrics</title>
-      <section
-        xml:id="hbase.regionserver.blockCacheCount">
-        <title><varname>blockCacheCount</varname></title>
-        <para>Point in time block cache item count in memory. This is the number of
blocks of
-          StoreFiles (HFiles) in the cache.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.blockCacheEvictedCount">
-        <title><varname>blockCacheEvictedCount</varname></title>
-        <para>Number of blocks that had to be evicted from the block cache due to heap
-          constraints by RegionServer since startup.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.blockCacheFree">
-        <title><varname>blockCacheFreeMB</varname></title>
-        <para>Point in time block cache memory available (MB).</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.blockCacheHitCount">
-        <title><varname>blockCacheHitCount</varname></title>
-        <para>Number of blocks of StoreFiles (HFiles) read from the cache by RegionServer
-          startup.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.blockCacheHitRatio">
-        <title><varname>blockCacheHitRatio</varname></title>
-        <para>Block cache hit ratio (0 to 100) from RegionServer startup. Includes
all read
-          requests, although those with cacheBlocks=false will always read from disk and
be counted
-          as a "cache miss", which means that full-scan MapReduce jobs can affect this metric
-          significantly.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.blockCacheMissCount">
-        <title><varname>blockCacheMissCount</varname></title>
-        <para>Number of blocks of StoreFiles (HFiles) requested but not read from the
cache from
-          RegionServer startup.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.blockCacheSize">
-        <title><varname>blockCacheSizeMB</varname></title>
-        <para>Point in time block cache size in memory (MB). i.e., memory in use by
-          BlockCache</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.fsPreadLatency">
-        <title><varname>fsPreadLatency*</varname></title>
-        <para>There are several filesystem positional read latency (ms) metrics, all
measured from
-          RegionServer startup.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.fsReadLatency">
-        <title><varname>fsReadLatency*</varname></title>
-        <para>There are several filesystem read latency (ms) metrics, all measured
from RegionServer
-          startup. The issue with interpretation is that ALL reads go into this metric (e.g.,
-          single-record Gets, full table Scans), including reads required for compactions.
-          metric is only interesting "over time" when comparing major releases of HBase or
your own
-          code.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.fsWriteLatency">
-        <title><varname>fsWriteLatency*</varname></title>
-        <para>There are several filesystem write latency (ms) metrics, all measured
-          RegionServer startup. The issue with interpretation is that ALL writes go into
this metric
-          (e.g., single-record Puts, full table re-writes due to compaction). This metric
is only
-          interesting "over time" when comparing major releases of HBase or your own code.</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.stores">
-        <title><varname>NumberOfStores</varname></title>
-        <para>Point in time number of Stores open on the RegionServer. A Store corresponds
to a
-          ColumnFamily. For example, if a table (which contains the column family) has 3
regions on
-          a RegionServer, there will be 3 stores open for that column family. </para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.storeFiles">
-        <title><varname>NumberOfStorefiles</varname></title>
-        <para>Point in time number of StoreFiles open on the RegionServer. A store
may have more
-          than one StoreFile (HFile).</para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.requests">
-        <title><varname>requestsPerSecond</varname></title>
-        <para>Point in time number of read and write requests. Requests correspond
to RegionServer
-          RPC calls, thus a single Get will result in 1 request, but a Scan with caching
set to 1000
-          will result in 1 request for each 'next' call (i.e., not each row). A bulk-load
-          will constitute 1 request per HFile. This metric is less interesting than
-          readRequestsCount and writeRequestsCount in terms of measuring activity due to
this metric
-          being periodic. </para>
-      </section>
-      <section
-        xml:id="hbase.regionserver.storeFileIndexSizeMB">
-        <title><varname>storeFileIndexSizeMB</varname></title>
-        <para>Point in time sum of all the StoreFile index sizes in this RegionServer
-      </section>
+    <section xml:id="rs_metrics">
+      <title>Most Important RegionServer Metrics</title>
+      <para>Previously, this section contained a list of the most important RegionServer
+        However, the list was extremely out of date. In some cases, the name of a given metric
+        changed. In other cases, the metric seems to no longer be exposed. An effort is underway
+        create automatic documentation for each metric based upon information pulled from
+        implementation.</para>
-  </section>
+  </section>      

