hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring
Date Fri, 02 Jun 2017 08:22:04 GMT

     [ https://issues.apache.org/jira/browse/HBASE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yu Li updated HBASE-15160:
--------------------------
    Attachment: hbase-15160_v7.patch

Confirmed that with {{System#currentTimeMillis}} the performance regression disappeared.
|| Case ||  Throughput (ops/s)|| AverageLatency(us)||
| w/o patch| 122079.26|26019.93|
|w/ patch v7| 121693.28 | 26688.72|

Although this might only happen when using fast disk like PCIe-SSD, I think we should still
make the change. What's more, milliseconds should be enough to monitor spike. Below is the
metrics data in the testing with PCIe-SSD:
{noformat}
    "FsPReadTime_num_ops" : 21828053,
    "FsPReadTime_min" : 0,
    "FsPReadTime_max" : 103,
    "FsPReadTime_mean" : 3,
    "FsPReadTime_25th_percentile" : 0,
    "FsPReadTime_median" : 0,
    "FsPReadTime_75th_percentile" : 5,
    "FsPReadTime_90th_percentile" : 7,
    "FsPReadTime_95th_percentile" : 9,
    "FsPReadTime_98th_percentile" : 17,
    "FsPReadTime_99th_percentile" : 91,
    "FsPReadTime_99.9th_percentile" : 98,
    "FsPReadTime_TimeRangeCount_0-1" : 26267,
    "FsPReadTime_TimeRangeCount_1-3" : 455,
    "FsPReadTime_TimeRangeCount_3-10" : 8366,
    "FsPReadTime_TimeRangeCount_10-30" : 661,
    "FsPReadTime_TimeRangeCount_30-100" : 705,
    "FsPReadTime_TimeRangeCount_100-300" : 15,
    "FsPReadTime_TimeRangeCount_600000-inf" : 21791593,
{noformat}

> Put back HFile's HDFS op latency sampling code and add metrics for monitoring
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15160
>                 URL: https://issues.apache.org/jira/browse/HBASE-15160
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0, 1.1.2
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Critical
>         Attachments: HBASE-15160.patch, HBASE-15160_v2.patch, HBASE-15160_v3.patch, hbase-15160_v4.patch,
hbase-15160_v5.patch, hbase-15160_v6.patch, hbase-15160_v7.patch
>
>
> In HBASE-11586 all HDFS op latency sampling code, including fsReadLatency, fsPreadLatency
and fsWriteLatency, have been removed. There was some discussion about putting them back in
a new JIRA but never happened. According to our experience, these metrics are useful to judge
whether issue lies on HDFS when slow request occurs, so we propose to put them back in this
JIRA, and add the metrics for monitoring as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message