hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring
Date Wed, 27 Jan 2016 03:03:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118531#comment-15118531
] 

Yu Li commented on HBASE-15160:
-------------------------------

Thanks for check here [~eclark]
bq. So one of the reasons that the metrics in the deep read paths were turned off is that
they were really expensive in the tight loops of readers.
>From HBASE-11586 it seems the HDFS op latency sampling codes were removed because we found
them not used/referenced. Not sure whether we did any performance comparison before/after
HBASE-11586, but IMHO all metrics will affect performance slightly. Since the HDFS op latency
could help us judge whether or not issue happens in HDFS, I think it's worthwhile to add the
sampling back.

bq. Those errors/hangs seems more than normal on an apache run
Agreed the errors/hangs seems abnormal, but I observed similar errors in UT of HBASE-15163
where changes of the patch should be safe enough. Allow me to resubmit patch and check the
result here.

bq. Do we have any before and after metrics on reads ?
AFAICS, we will update get metrics at end of RSRpcServices#get, but nothing deeper inside
for reads. While for writes we have the syncTime metrics which will get updated in postSync

> Put back HFile's HDFS op latency sampling code and add metrics for monitoring
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15160
>                 URL: https://issues.apache.org/jira/browse/HBASE-15160
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0, 1.1.2
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-15160.patch, HBASE-15160_v2.patch
>
>
> In HBASE-11586 all HDFS op latency sampling code, including fsReadLatency, fsPreadLatency
and fsWriteLatency, have been removed. There was some discussion about putting them back in
a new JIRA but never happened. According to our experience, these metrics are useful to judge
whether issue lies on HDFS when slow request occurs, so we propose to put them back in this
JIRA, and add the metrics for monitoring as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message