hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Smith (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-12097) Blocked threads on hbase, slowly increasing, appears to be updating metrics
Date Thu, 25 Sep 2014 18:44:34 GMT

     [ https://issues.apache.org/jira/browse/HBASE-12097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Michael Smith updated HBASE-12097:
    Attachment: total_blocks_versus_threads-week.png

The attached graph shows the increasing number of blocked threads. The massive drops, are
when we do rolling restarts of all the regionservers. You then see the number of blocked threads
slowly starting to grow.

We determine the number of blocked threads, by hitting the web interface, ie <host:60030>/dump,
and then for each thread, counting the number of 'Status: BLOCKED' threads.

Analysis of the Blocked threads, has revealed its blocked on updating Metrics

> Blocked threads on hbase, slowly increasing, appears to be updating metrics
> ---------------------------------------------------------------------------
>                 Key: HBASE-12097
>                 URL: https://issues.apache.org/jira/browse/HBASE-12097
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.6
>         Environment: RHEL 6.2, CDH
>            Reporter: Michael Smith
>         Attachments: total_blocks_versus_threads-week.png
> Hbase shows an increasing number of IPC Threads in BLOCKED state
> Hundreds of these,more and more appearing over hours, performance degrading, requiring
regionserver restart to restore performance.
> Thread:
> Thread 421 (IPC Server handler 368 on 60201):
>   State: BLOCKED
>   Blocked count: 19314
>   Waited count: 322565
>   Blocked on org.apache.hadoop.metrics.util.MetricsIntValue@1ec5ca55
>   Blocked by 236 (IPC Server handler 183 on 60201)
>   Stack:
>     org.apache.hadoop.metrics.util.MetricsIntValue.set(MetricsIntValue.java:73)
> org.apache.hadoop.hbase.ipc.HBaseServer.updateCallQueueLenMetrics(HBaseServer.java:1360)
> i dont actually know how to troubleshoot this much further... Happy to take suggestions...

This message was sent by Atlassian JIRA

View raw message