hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (HBASE-11062) htop
Date Fri, 21 Aug 2015 17:31:47 GMT

     [ https://issues.apache.org/jira/browse/HBASE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrew Purtell reassigned HBASE-11062:

    Assignee: Andrew Purtell

Consider something like NNtop (HDFS-6982) and YARN top (YARN-3348).

Features in common:
- Command line utility
- Unix top-like presentation (curses-like interface)
- Cluster health in display header
- Summary utilization metrics
- Windowed data collection
- Cache data for short periods of time

I think our approach would look a lot like HDFS's. All of the necessary information to make
NNtop, as you might expect, is collected and exported by a singleton process. We can do something
similar with our master. All regionservers already periodically report load statistics to
the master, this is what populates data returned by Admin#getClusterStatus. We'd augment the
regionserver reports with top-K usage stats. Like HDFS-6982, we'd collect and manage the information
as a MetricsSource implementation, thus exposing it by JMX and HTTP for use by a CLI tool.
See the patch on HDFS-6982 for as sketch of what a patch for our master might look like.

Views to be presented by the CLI tool:
* Status header: master uptime, live servers, dead servers, aggregate ops/sec
* Default (Table oriented)
** By table, drill down to region
** By user identity
** By client location
* Namespace
** By namespace, drill down to table
** By user identity
** By client location
* Region
** By column family, drill down to CF
** By key
** By operation type
** By user identity
** By client location
* Column family
** By key
** By operation type
** By user identity
** By client location

Columns in the views:
* Primary sort order
* Secondary sort order
* Total access count per second
* Summary access latency in ms, (avg, p75, p90, p95, p99, max, adjustable with keypress)
* Data volume (display unit adjustable with keypress)

Sort ordering:
|Table (default view)|Table|Region|
|Table by user|User|Table|
|Table by client|Client|Table|
|Namespace by user|User|Namespace|
|Namespace by client|Client|Namespace|
|Region by CF (default region view)|Region|CF|
|Region by key|Key|Region|
|Region by operation|Op type|Region|
|Region by user|User|Region|
|Region by client|Client|Region|
|CF by key (default CF view)|Key|CF|
|CF by operation|Op type|CF|
|CF by user|User|CF|
|CF by client|Client|CF|

Where not sorting by operation type we should separate op count and latencies for read and
write operations into their own columns.

In most views the contents of the secondary sort field won't change.

Interesting future ideas:
* Monitored tasks (HBASE-4349)
* Read replica awareness

What else?

> htop
> ----
>                 Key: HBASE-11062
>                 URL: https://issues.apache.org/jira/browse/HBASE-11062
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
> A top-like monitor could be useful for testing, debugging, operations of clusters of
moderate size, and possibly for diagnosing issues in large clusters.
> Consider a curses interface like the one presented by atop (http://www.atoptool.nl/images/screenshots/genericw.png)
- with aggregate metrics collected over a monitoring interval in the upper portion of the
pane, and a listing of discrete measurements sorted and filtered by various criteria in the
bottom part of the pane. One might imagine a cluster overview with cluster aggregate metrics
above and a list of regionservers sorted by utilization below; and a regionserver view with
process metrics above and a list of metrics by operation type below, or a list of client connections,
or a list of threads, sorted by utilization, throughput, or latency. 
> Generically 'htop' is taken but would be distinctive in the HBase context, a utility
> No need necessarily for a curses interface. Could be an external monitor with a web front
end as has been discussed before. I do like the idea of a process that runs in a terminal
because I interact with dev and test HBase clusters exclusively by SSH. 

This message was sent by Atlassian JIRA

View raw message