lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] [Commented] (SOLR-9898) Documentation for metrics collection and /admin/metrics
Date Wed, 28 Dec 2016 12:32:58 GMT


Andrzej Bialecki  commented on SOLR-9898:

h1. Overview
Solr 6.4 adds a developer API and instrumentation for the collection of detailed performance-oriented
metrics throughout the life-cycle of Solr service and its various components. Internally it
uses [Dropwizard Metrics API|], which uses the following classes
of meters to measure events:
* *counters* - simply count events. They provide a single long value, e.g. the number of requests.
* *meters* - additionally compute rates of events. Provide a count (as above) and 1-, 5-,
and 15-minute exponentially decaying rates, similarly to the Unix system load average.
* *histograms* - calculate approximate distribution of events according to their values. Provide
the following approximate statistics, with a similar exponential decay as above: mean (arithmetic
average), median, maximum, minimum, standard deviation, and 75-th, 95-th, 98-th, 99-th and
999-th percentiles. 
* *timers* - measure the number and duration of events. They provide a count and histogram
of timings.
* *gauges* - offer instantaneous reading of a current value, e.g. current queue depth, current
number of active connections, free heap size.

Group of related metrics with unique names is managed in a *metric registry*. Solr maintains
several such registries, each corresponding to a high-level group such as: {{jvm, jetty, http,
node, core}} (see below). Metrics are maintained and accumulated through all life-cycles of
components since the start of the process until its shutdown - e.g. metrics for a particular
SolrCore are tracked through possibly several load / unload / rename operations, and deleted
only when a core is explicitly deleted. However, metrics are not persisted across process
restarts - restarting Solr will discard all collected metrics.

For each group (and/or for each registry) there can be several *reporters* - components responsible
for communication of metrics from selected registries to external systems. Currently implemented
reporters support emitting metrics via JMX, Ganglia, Graphite and SLF4J. There is also a dedicated
{{/admin/metrics}} handler that can be queried to report all or a subset of the current metrics
from multiple registries.

h2. Metric groups
These are the major groups of metrics that are collected:

h3. JVM level ({{solr.jvm}} registry):
* direct and mapped buffer pools
* class loading / unloading
* OS memory, CPU time, file descriptors, swap, system load
* GC count and time
* heap, non-heap memory and GC pools
* number of threads, their states and deadlocks

h3. Node / CoreContainer level ({{solr.node}} registry):
* handler requests (count, timing): collections, info, admin, configSets, etc.
* number of cores (loaded, lazy, unloaded)

h3. Core (SolrCore) level ({{solr.core.<collection>...}} registries, one for each core):
* all common RequestHandler-s report: request timers / counters, timeouts, errors.
* index-level events (in progress - SOLR-9854): meters for minor / major merges, number of
merged docs, number of deleted docs, gauges for currently running merges and their size.
* directory-level IO: total read / write meters, histograms for read / write operations and
their size, optionally split per index file (eg. field data, term dictionary, docValues, etc)
(SOLR-9854 in progress)
* shard replication and transaction log replay on replicas (TBD, SOLR-9856)
* TBD: caches, update handler details, and other relevant SolrInfoMBean-s

h3. HTTP level ({{solr.http}} registry):
* open / available / pending connections for shard handler and update handler

h3. Jetty level ({{solr.jetty}} registry):
* threads and pools,
* connection and request timers,
* meters for responses by HTTP class (1xx, 2xx, etc)

h3. Shard leader (TBD)
* aggregated metrics from each replica (SOLR-9857)

h3. Overseer (TBD)
* aggregated metrics from shard leaders and cluster nodes (SOLR-9858)

> Documentation for metrics collection and /admin/metrics
> -------------------------------------------------------
>                 Key: SOLR-9898
>                 URL:
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: master (7.0), 6.4
>            Reporter: Andrzej Bialecki 
> Draft documentation follows.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message