hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring
Date Wed, 02 Mar 2016 23:37:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176718#comment-15176718

Enis Soztutar commented on HBASE-15160:

bq. Yes, already made the change in the latest patch. 
Ok, I was looking at the following for why we are not using a histogram for this: 
+  private static final BlockingQueue<Long> fsReadLatenciesNanos =
+      new ArrayBlockingQueue<Long>(LATENCY_BUFFER_SIZE);
+  private static final BlockingQueue<Long> fsWriteLatenciesNanos =
+      new ArrayBlockingQueue<Long>(LATENCY_BUFFER_SIZE);

For every RPC and for every operation (get, etc), we already increment counters or histograms
directly inline, rather than keeping track of individual points like the one in the patch
and bulk updating the histograms frequently. Since num gets > num fs operations in theory,
doing the counter updates inline should not be a perf regression. This is of course to be
verified if possible. 

One other thing is that instead of using the histogram inline (which is based on FastLongHistogram
/ Counters and high perf counters) we are using a BlockingQueue which is using a RWLock and
in-theory more costly. So doing this indirect way maybe even worse than doing inline updates.

> Put back HFile's HDFS op latency sampling code and add metrics for monitoring
> -----------------------------------------------------------------------------
>                 Key: HBASE-15160
>                 URL: https://issues.apache.org/jira/browse/HBASE-15160
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0, 1.1.2
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-15160.patch, HBASE-15160_v2.patch, HBASE-15160_v3.patch
> In HBASE-11586 all HDFS op latency sampling code, including fsReadLatency, fsPreadLatency
and fsWriteLatency, have been removed. There was some discussion about putting them back in
a new JIRA but never happened. According to our experience, these metrics are useful to judge
whether issue lies on HDFS when slow request occurs, so we propose to put them back in this
JIRA, and add the metrics for monitoring as well.

This message was sent by Atlassian JIRA

View raw message