Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B7841200C36 for ; Fri, 10 Mar 2017 11:36:09 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id B608B160B79; Fri, 10 Mar 2017 10:36:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DB326160B69 for ; Fri, 10 Mar 2017 11:36:08 +0100 (CET) Received: (qmail 60304 invoked by uid 500); 10 Mar 2017 10:36:08 -0000 Mailing-List: contact issues-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list issues@ambari.apache.org Received: (qmail 60295 invoked by uid 99); 10 Mar 2017 10:36:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Mar 2017 10:36:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id AC4371A7AB8 for ; Fri, 10 Mar 2017 10:36:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.451 X-Spam-Level: * X-Spam-Status: No, score=1.451 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ens7_bZD3LSP for ; Fri, 10 Mar 2017 10:36:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 79F725FAE0 for ; Fri, 10 Mar 2017 10:36:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 94C53E0045 for ; Fri, 10 Mar 2017 10:36:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 47CF624369 for ; Fri, 10 Mar 2017 10:36:04 +0000 (UTC) Date: Fri, 10 Mar 2017 10:36:04 +0000 (UTC) From: "Chuan Jin (JIRA)" To: issues@ambari.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AMBARI-20392) Get aggregate metric records from HBase encounters performance issues MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 10 Mar 2017 10:36:09 -0000 [ https://issues.apache.org/jira/browse/AMBARI-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904867#comment-15904867 ] Chuan Jin commented on AMBARI-20392: ------------------------------------ Below is my queries: {code:sql} 0: jdbc:phoenix:my-zk > select count(1) . . . . . . . . . . . > FROM METRIC_AGGREGATE . . . . . . . . . . . > WHERE METRIC_NAME IN ('pkts_out','pkts_in','cpu_wio', 'cpu_idle', 'cpu_nice','cpu_user', 'cpu_system','mem_total','mem_free', 'yarn.NodeManagerMetrics.ContainersCompleted', 'yarn.NodeManagerMetrics.ContainersRunning', 'yarn.NodeManagerMetrics.ContainersFailed', 'yarn.NodeManagerMetrics.ContainersLaunched', 'yarn.NodeManagerMetrics.ContainersKilled', 'yarn.NodeManagerMetrics.ContainersIniting') . . . . . . . . . . . > AND APP_ID = 'nodemanager' . . . . . . . . . . . > AND SERVER_TIME >= 1489121698000 . . . . . . . . . . . > AND SERVER_TIME < 1489125298000; +-----------+ | COUNT(1) | +-----------+ | 1800 | +-----------+ 1 row selected (37.821 seconds) {code} i split them into four queries: {code:sql} 0: jdbc:phoenix:my-zk > SELECT count(1) . . . . . . . . . . . > FROM METRIC_AGGREGATE . . . . . . . . . . . > WHERE METRIC_NAME IN ('pkts_out','pkts_in') . . . . . . . . . . . > AND APP_ID = 'nodemanager' . . . . . . . . . . . > AND SERVER_TIME >= 1489121698000 . . . . . . . . . . . > AND SERVER_TIME < 1489125298000; +-----------+ | COUNT(1) | +-----------+ | 240 | +-----------+ 1 row selected (0.142 seconds) 0: jdbc:phoenix:my-zk > SELECT count(1) . . . . . . . . . . . > FROM METRIC_AGGREGATE . . . . . . . . . . . > WHERE METRIC_NAME IN ('cpu_wio', 'cpu_idle', 'cpu_nice','cpu_user', 'cpu_system') . . . . . . . . . . . > AND APP_ID = 'nodemanager' . . . . . . . . . . . > AND SERVER_TIME >= 1489121698000 . . . . . . . . . . . > AND SERVER_TIME < 1489125298000; +-----------+ | COUNT(1) | +-----------+ | 600 | +-----------+ 1 row selected (0.266 seconds) 0: jdbc:phoenix:my-zk > SELECT count(1) . . . . . . . . . . . > FROM METRIC_AGGREGATE . . . . . . . . . . . > WHERE METRIC_NAME IN ('mem_total','mem_free') . . . . . . . . . . . > AND APP_ID = 'nodemanager' . . . . . . . . . . . > AND SERVER_TIME >= 1489121698000 . . . . . . . . . . . > AND SERVER_TIME < 1489125298000; +-----------+ | COUNT(1) | +-----------+ | 240 | +-----------+ 1 row selected (0.12 seconds) 0: jdbc:phoenix:my-zk > SELECT count(1) . . . . . . . . . . . > FROM METRIC_AGGREGATE . . . . . . . . . . . > WHERE METRIC_NAME IN ('yarn.NodeManagerMetrics.ContainersCompleted', 'yarn.NodeManagerMetrics.ContainersRunning', 'yarn.NodeManagerMetrics.ContainersFailed', 'yarn.NodeManagerMetrics.ContainersLaunched', 'yarn.NodeManagerMetrics.ContainersKilled', 'yarn.NodeManagerMetrics.ContainersIniting') . . . . . . . . . . . > AND APP_ID = 'nodemanager' . . . . . . . . . . . > AND SERVER_TIME >= 1489121698000 . . . . . . . . . . . > AND SERVER_TIME < 1489125298000; +-----------+ | COUNT(1) | +-----------+ | 720 | +-----------+ 1 row selected (0.154 seconds) {code} > Get aggregate metric records from HBase encounters performance issues > --------------------------------------------------------------------- > > Key: AMBARI-20392 > URL: https://issues.apache.org/jira/browse/AMBARI-20392 > Project: Ambari > Issue Type: Improvement > Components: ambari-metrics > Affects Versions: 2.4.2 > Reporter: Chuan Jin > > I have a mini cluster ( ~6 nodes) managed by Ambari, and use a distributed HBase (~3 nodes) to hold metrics collected from these nodes. After I deploy YARN serivce, then I notice that some widgets (Cluster Memory,Cluster Disk,...) cannot display properly in the YARN service dashboard page. And Ambari Server has continuous timeout exceptions, which complains that it doesn't get timeline metrics for connection refused. > The request timeout parameter is 5s, which means the query of getting metrics from HBase takes more time than that. Then I use Phoenix shell to login and perform the same query in the HBase , and it takes nearly 30s to finish. But If I split the big query into small pieces , i mean, use less values in the "metric_name" field in the where ... in clause , then the result return in 1s after several small queries. > The query performance in HBase is highly based on the design of rowkey and the proper usage for it. In the method of getting aggregate metrics, AMS collector query the METRIC_AGGREGATE table in a way that may cause the co-processor to scan several regions across different RS. If we add more metrics in the service dashboard, this situation will be worse. -- This message was sent by Atlassian JIRA (v6.3.15#6346)