hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sterfield <sterfi...@gmail.com>
Subject Re: Spikes in JMX metrics gathered on Region Servers
Date Tue, 16 Aug 2016 16:40:50 GMT
> What are the graphs about? Can you say what the metric is, put it in
> context with others such as number of ops and something l like gc and then
> do it for a longer time?

Sorry about that, you are right, I'm giving very few context.

I'm currently benchmarking OpenTSDB, pushing 150k data per seconds to Hbase
for hours, in order to see how the system is behaving.

I'm grabbing JMX metrics on RegionServers using collect, with the
GenericJMX plugin [1], every 10 seconds. I'm currently gathering :

   - From "JvmMetrics" :
      - MemHeapUsedM
      - MemHeapMaxM
      - GcTime
      - GcCount
   - From "RegionServer, sub=IPC"
      - numCallsInGeneraQueue
   - From "RegionServer, sub=Server"
      - regionCount
      - StoreFileCount
      - writeRequestCount
      - readRequestCount
      - compactionQueueLength
      - flushQueueLength
      - memStoreSize
      - flushedCellsSize
      - FlushTime_mean
      - FlushTime_num_ops

>From time to time, I'm seeing some huge spikes, most of the time on
"readRequestCount" and "WriteRequestCount". Those spikes are not related to
a "counter wrapping", but as shown in the first graphs, a value that is
suddendly much lower than the previous (hence a huge spike in grafana).

On the start/stop of metrics system, it is not pretty, but that is the only
> recourse in hadoop metrics for clearing out metrics that are no longer
> being updated; in particular, when a region goes away either because it
> split or was removed, the stop/start operation is how associated metrics
> are removed.
> St.Ack


[1] : https://collectd.org/wiki/index.php/Plugin:GenericJMX

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message