Return-Path: X-Original-To: apmail-ambari-user-archive@www.apache.org Delivered-To: apmail-ambari-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9108910214 for ; Thu, 7 May 2015 04:53:53 +0000 (UTC) Received: (qmail 91352 invoked by uid 500); 7 May 2015 04:53:53 -0000 Delivered-To: apmail-ambari-user-archive@ambari.apache.org Received: (qmail 91316 invoked by uid 500); 7 May 2015 04:53:53 -0000 Mailing-List: contact user-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ambari.apache.org Delivered-To: mailing list user@ambari.apache.org Received: (qmail 91306 invoked by uid 99); 7 May 2015 04:53:53 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 May 2015 04:53:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id BEBCD1A262D for ; Thu, 7 May 2015 04:53:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.14 X-Spam-Level: *** X-Spam-Status: No, score=3.14 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, KAM_LOTSOFHASH=0.25, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id My7_ry19jo5a for ; Thu, 7 May 2015 04:53:21 +0000 (UTC) Received: from nm41-vm3.bullet.mail.ne1.yahoo.com (nm41-vm3.bullet.mail.ne1.yahoo.com [98.138.120.219]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 7264625987 for ; Thu, 7 May 2015 04:53:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1430974332; bh=FRR9yHHeWPTePklcAVrUXsSfD2Vp5ZVYchKz5lC+r4o=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=Q124dimtEQMj+FFqB7mcwXq41Jx2LFddj467Qm053XNdU5zmPY6YVXn797GmPKeGeADX02Vs74ZBtg57KzW12FSAMo2WPMI9Da4nnASvEJk6tyuFjoqpRyGNr/CmjHqZewrXMMm2rc4f4MP4JmVcAllU4mvGd5vffAUr0QMoisiciWCyMY9gEC3edLGtLGwcrB60srreGXkHritypp+vQruE8GaScswSYiEoMPlqMrhHlNd0KFTRkM+I0YzInOCUcDG2QFdzsROzJwRj3bwhr4EYKt9X/75DJFuuwQHDmjnQFsXjWgsO8naou6Rd1yRcyoIZlBTGUtW9/v/VNiyPUw== Received: from [127.0.0.1] by nm41.bullet.mail.ne1.yahoo.com with NNFMP; 07 May 2015 04:52:12 -0000 Received: from [98.138.226.176] by nm41.bullet.mail.ne1.yahoo.com with NNFMP; 07 May 2015 04:49:30 -0000 Received: from [98.138.89.162] by tm11.bullet.mail.ne1.yahoo.com with NNFMP; 07 May 2015 04:49:30 -0000 Received: from [127.0.0.1] by omp1018.mail.ne1.yahoo.com with NNFMP; 07 May 2015 04:49:30 -0000 X-Yahoo-Newman-Property: ymail-4 X-Yahoo-Newman-Id: 187448.205.bm@omp1018.mail.ne1.yahoo.com X-YMail-OSG: x0sTc.UVM1kUX7.yumMzx7VPOp8GOmZ5tfTdDrQiwLs_o7U6navFWH5mQy_UWyI YQDoEhqk5wwrjW3VNmQDZ5NlvmEUEKURGb1LCNQmm4TTdsjZroWlJ3HVyPNWFkP4nrxI_c5Mwm86 j1kxmPW18LVzn79hIL8F9i0dQ22I8eTe3lJSNFrSa.velAlmhXeNm.weS5.GwLr19Whn5yAqeOvz aKRbgssCZDdyx1a4snPHrZ0EoNPHv9VmAG0dj9AZeCn5CI.1zD0Zu.GFDRLFpKebo2r6aplvuksC 0Nsb5ZXXviAFQANGyt3mk5RlGYGo.MMAliRNdzhva3m0TAjl1CQWaLSZ.uQ.U03VRButwy4HWUzQ MSeoM2X1h55MfrD6zlSQ7ptPIKpWKchS73xLui3bNL5u_.XppC5ZxHx9lnS0NDaWAC9bOwnE.m2d tCq6yuiaB53WDwu_JPhoGdoshFK0oSefwGfOGtadxe7sKMWQSGofztr9Gna3VAL9LQUuXj8z0mT. ofP5lOHABQEGyK312 Received: by 98.138.105.211; Thu, 07 May 2015 04:49:29 +0000 Date: Thu, 7 May 2015 04:49:29 +0000 (UTC) From: Jayesh Thakrar Reply-To: Jayesh Thakrar To: "user@ambari.apache.org" Message-ID: <393372575.1702179.1430974169209.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: <23F352694FA4F0DD.1-9e4a6299-7f0b-41b4-bfc9-de2970915bcb@mail.outlook.com> References: <23F352694FA4F0DD.1-9e4a6299-7f0b-41b4-bfc9-de2970915bcb@mail.outlook.com> Subject: Re: Kafka broker metrics not appearing in REST API MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1702178_1518787780.1430974169180" ------=_Part_1702178_1518787780.1430974169180 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks Sid.I had trouble restarting the metrics collector.Somehow it was co= mplaining about not being able to connect to the HBase embedded zookeeper o= n port 61181.Anyway, after a few tries I was able to bring it up.I had brou= ght down the whole metrics collection system - and just completed a rolling= restart of the metrics collector. After all the above I am back to the same reproducible situation of no Kafk= a metrics. However I examined the servers on the cluster and apparently there is not s= ufficient free RAM - or should I say a lot of it is used by filesystem buff= er/caching (15 GB or 30%). This is not surprising as both Flume and Kafka a= re heavy on (sequential) I/O. I will need to "resolve" this situation before I can increase memory for HB= ase. But all the same, thanks for the pointers - this has give me enough things = to look into. [root@dtord01flm01p ~]# free -m=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0total =C2=A0 =C2=A0 =C2=A0 used =C2=A0 =C2=A0 =C2=A0 free =C2=A0 =C2= =A0 shared =C2=A0 =C2=A0buffers =C2=A0 =C2=A0 cachedMem: =C2=A0 =C2=A0 =C2= =A0 =C2=A0 48251 =C2=A0 =C2=A0 =C2=A047657 =C2=A0 =C2=A0 =C2=A0 =C2=A0593 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 32 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A04 =C2=A0 = =C2=A0 =C2=A032417-/+ buffers/cache: =C2=A0 =C2=A0 =C2=A015235 =C2=A0 =C2= =A0 =C2=A033015Swap: =C2=A0 =C2=A0 =C2=A0 =C2=A0 8191 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 8191 From: Siddharth Wagle To: "user@ambari.apache.org" ; "user@ambari.apache= .org" =20 Sent: Wednesday, May 6, 2015 11:31 PM Subject: Re: Kafka broker metrics not appearing in REST API =20 Is this or can this be writing to its own disk?=C2=A0 Can you look at the hbase regionserver web ui, in your browser key in the h= ttp://metric-collrctor-host:61310. This the hbase master info port.Click on the link to regionserver. Look at = Queues and the block cache stats. Queues should be empty for majority of the time, lets say when u refresh pa= ge a few times. If not, this points to disk io bottleneck. Cache hit ratio as i have observed this is around 70%, more the better. Do you have available physical memory on that box? If yes, the hbase master= heap size in ams-hbase-env and the metric collector heap size in ams-env s= hould be bumped up. Default is 512m in both cases. Sid Sent by Outlook for Android On Wed, May 6, 2015 at 9:16 PM -0700, "Jayesh Thakrar" wrote: Here's the embedded HBase data dir size. [jthakrar@dtord01flm03p data]$ pwd/localpart0/ambari-metrics-collector/hbas= e/data [jthakrar@dtord01flm03p data]$ du -xhs *14G =C2=A0 =C2=A0 default72K =C2=A0= =C2=A0 hbase From: Siddharth Wagle To: "j_thakrar@yahoo.com" ; "user@ambari.apache.org" <= user@ambari.apache.org> Sent: Wednesday, May 6, 2015 11:00 PM Subject: Re: Kafka broker metrics not appearing in REST API We have tested the embeded mode to work with upto 400 node cluster and mult= iple services running on it. You can change the hbase.rootdir in ams-hbase-site and possibly write to pa= rtition with separate disk mount. And copy over the data from existing location. It would be good to know wha= t is the size of data written to hbase.rootdir to get an idea of what kind = of write volume are we looking at.=C2=A0 Sid Sent by Outlook for Android On Wed, May 6, 2015 at 8:52 PM -0700, "Jayesh Thakrar"= wrote: We have a 30-node cluster.Unfortunately, this is also our production cluste= r and there's no HDFS as it is a dedicated Flume cluster.We have installed = Ambari + Storm + Kafka (HDP) on a cluster on which we have production data = being flumed.The flume data is being sent to an HDFS cluster which is a lit= tle overloaded, so we want to send flume data to Kafka and then "throttle" = the data being loaded into the HDFS cluster. But you have given me an idea - maybe I can setup a new HBase file location= so that I can do away with HBase data corruption, if any. It will take me some time to do that, will let you know once I have tried i= t out. Thanks,jayesh From: Siddharth Wagle To: "user@ambari.apache.org" ; Jayesh Thakrar Sent: Wednesday, May 6, 2015 10:42 PM Subject: Re: Kafka broker metrics not appearing in REST API #yiv9028015761 #yiv9028015761 -- -- --P {margin-top:0;margin-bottom:0;}#yiv= 9028015761 --P {margin-top:0;margin-bottom:0;}#yiv9028015761 How big is you= r cluster in terms of number of nodes?You can tune settings for HBase based= on cluster size.=20 Following are the instructions for writing metrics to HDFS instead of local= FS. ams-site::: timeline.metrics.service.operation.mode =3D distributed ams-hbase-site:::hbase.rootdir =3D hdfs://:8020/amshbasehbas= e.cluster.distributed =3D true -Sid From: Jayesh Thakrar Sent: Wednesday, May 06, 2015 8:30 PM To: user@ambari.apache.org; Siddharth Wagle; Jayesh Thakrar Subject: Re: Kafka broker metrics not appearing in REST API=C2=A0More info.= ... I was doing some "stress-testing" and interestingly, the Metrics Collector = crashed 2 times and I had to restart it (don't like a file-based HBase for = the metrics collector, but not very confident of configuring the system to = point to an existing HBase cluster). Also, after this email thread, I looked up =C2=A0the metrics collector logs= and see errors like this - METRIC_RECORD' at region=3DMETRIC_RECORD,,1429966316307.947cfa22f884d035c09= fe804b1f5402c., hostname=3Ddtord01flm03p.dc.dotomi.net,60455,1429737430103,= seqNum=3D24393013:09:37,619 =C2=A0INFO [phoenix-1-thread-349921] RpcRetryi= ngCaller:129 - Call exception, tries=3D11, retries=3D35, started=3D835564 m= s ago, cancelled=3Dfalse, msg=3Drow 'kafka.network.RequestMetrics.Metadata-= RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL=C3=AF=C2= =BF=C2=BD=C3=AF=C2=BF=C2=BD:=C3=AF=C2=BF=C2=BDkafka_broker' on table 'METRI= C_RECORD' at region=3DMETRIC_RECORD,kafkark.RequestMetrics.Metadata-Request= sPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED= :\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45., hostname= =3Ddtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=3D24393113:10:58= ,082 =C2=A0INFO [phoenix-1-thread-349920] RpcRetryingCaller:129 - Call exce= ption, tries=3D12, retries=3D35, started=3D916027 ms ago, cancelled=3Dfalse= , msg=3Drow '' on table 'METRIC_RECORD' at region=3DMETRIC_RECORD,,14299663= 16307.947cfa22f884d035c09fe804b1f5402c., hostname=3Ddtord01flm03p.dc.dotomi= .net,60455,1429737430103, seqNum=3D24393013:10:58,082 =C2=A0INFO [phoenix-1= -thread-349921] RpcRetryingCaller:129 - Call exception, tries=3D12, retries= =3D35, started=3D916027 ms ago, cancelled=3Dfalse, msg=3Drow 'kafka.network= .RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotom= i.net^@^@^@^AL=C3=AF=C2=BF=C2=BD=C3=AF=C2=BF=C2=BD:=C3=AF=C2=BF=C2=BDkafka_= broker' on table 'METRIC_RECORD' at region=3DMETRIC_RECORD,kafkark.RequestM= etrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x= 00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb8= 1fdfa475dd45., hostname=3Ddtord01flm03p.dc.dotomi.net,60455,1429737430103, = seqNum=3D24393113:10:58,112 ERROR [Thread-25] TimelineMetricAggregator:221 = - Exception during aggregating metrics.org.apache.phoenix.exception.Phoenix= IOException: org.apache.phoenix.exception.PhoenixIOException: Failed after = attempts=3D36, exceptions:Sat Apr 25 13:10:58 UTC 2015, null, java.net.Sock= etTimeoutException: callTimeout=3D900000, callDuration=3D938097: row '' on = table 'METRIC_RECORD' at region=3DMETRIC_RECORD,,1429966316307.947cfa22f884= d035c09fe804b1f5402c., hostname=3Ddtord01flm03p.dc.dotomi.net,60455,1429737= 430103, seqNum=3D243930 =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.phoenix.util.ServerUtil.parseServ= erException(ServerUtil.java:107)=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.p= hoenix.iterate.ParallelIterators.getIterators(ParallelIterators.java:527)= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.phoenix.iterate.MergeSortResultIt= erator.getIterators(MergeSortResultIterator.java:48)=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(Me= rgeSortResultIterator.java:63)=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.pho= enix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:90)= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.phoenix.iterate.MergeSortTopNResu= ltIterator.next(MergeSortTopNResultIterator.java:87)=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.ja= va:739)=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.yarn.server.applica= tionhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregateMetri= csFromResultSet(TimelineMetricAggregator.java:104)=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.time= line.TimelineMetricAggregator.aggregate(TimelineMetricAggregator.java:72)= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.yarn.server.applicationhis= toryservice.metrics.timeline.AbstractTimelineAggregator.doWork(AbstractTime= lineAggregator.java:217)=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.ya= rn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggreg= ator.runOnce(AbstractTimelineAggregator.java:94)=C2=A0 =C2=A0 =C2=A0 =C2=A0= at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timelin= e.AbstractTimelineAggregator.run(AbstractTimelineAggregator.java:70) From: Jayesh Thakrar To: Siddharth Wagle ; "user@ambari.apache.org" Sent: Wednesday, May 6, 2015 10:07 PM Subject: Re: Kafka broker metrics not appearing in REST API Hi Siddharth, Yes, I am using=C2=A0Ambari 2.0 with Ambari Metrics service.The interesting= thing is that I got them for some time and not anymore.And I also know tha= t the metrics are being collected since i can see them on the dashboard.Any= pointer for troubleshooting? And btw, it would be nice to have a count of messages received and not a co= mputed metric count / min.TSDB does a good job of giving me cumulative and = rate-per-sec graphs and numbers. Thanks in advance,Jayesh From: Siddharth Wagle To: "user@ambari.apache.org" ; Jayesh Thakrar Sent: Wednesday, May 6, 2015 10:03 PM Subject: Re: Kafka broker metrics not appearing in REST API #yiv9028015761 #yiv9028015761 -- -- -- --P {margin-top:0;margin-bottom:0;}#= yiv9028015761 Hi Jayesh, Are you using Ambari 2.0 with Ambari Metrics service? BR,Sid From: Jayesh Thakrar Sent: Wednesday, May 06, 2015 7:53 PM To: user@ambari.apache.org Subject: Kafka broker metrics not appearing in REST API=C2=A0Hi, I have installed 2 clusters with Ambari and Storm and Kafka.After the insta= ll, I was able to get metrics for both Storm and Kafka via REST API.This wo= rked fine for a week, but since the past 2 days, I have not been getting Ka= fka metrics. I need the metrics to push to an OpenTSDB cluster.I do get host metrics and= Nimbus metrics but not KAFKA_BROKER metrics. I did have maintenance turned on for some time, but maintenance is turned o= ff now. [jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p= :8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=3Dmetri= cs'{=C2=A0 "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flume_ka= fka_prod/components/NIMBUS?fields=3Dmetrics",=C2=A0 "ServiceComponentInfo" = : {=C2=A0 =C2=A0 "cluster_name" : "ord_flume_kafka_prod",=C2=A0 =C2=A0 "com= ponent_name" : "NIMBUS",=C2=A0 =C2=A0 "service_name" : "STORM"=C2=A0 },=C2= =A0 "metrics" : {=C2=A0 =C2=A0 "storm" : {=C2=A0 =C2=A0 =C2=A0 "nimbus" : {= =C2=A0 =C2=A0 =C2=A0 =C2=A0 "freeslots" : 54.0,=C2=A0 =C2=A0 =C2=A0 =C2=A0 = "supervisors" : 27.0,=C2=A0 =C2=A0 =C2=A0 =C2=A0 "topologies" : 0.0,=C2=A0 = =C2=A0 =C2=A0 =C2=A0 "totalexecutors" : 0.0,=C2=A0 =C2=A0 =C2=A0 =C2=A0 "to= talslots" : 54.0,=C2=A0 =C2=A0 =C2=A0 =C2=A0 "totaltasks" : 0.0,=C2=A0 =C2= =A0 =C2=A0 =C2=A0 "usedslots" : 0.0=C2=A0 =C2=A0 =C2=A0 }=C2=A0 =C2=A0 }=C2= =A0 }} [jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p= :8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields= =3Dmetrics'{=C2=A0 "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_= flume_kafka_prod/components/KAFKA_BROKER?fields=3Dmetrics",=C2=A0 "ServiceC= omponentInfo" : {=C2=A0 =C2=A0 "cluster_name" : "ord_flume_kafka_prod",=C2= =A0 =C2=A0 "component_name" : "KAFKA_BROKER",=C2=A0 =C2=A0 "service_name" := "KAFKA"=C2=A0 }} [jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p= :8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=3Dm= etrics'{=C2=A0 "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flum= e_kafka_prod/components/SUPERVISOR?fields=3Dmetrics",=C2=A0 "ServiceCompone= ntInfo" : {=C2=A0 =C2=A0 "cluster_name" : "ord_flume_kafka_prod",=C2=A0 =C2= =A0 "component_name" : "SUPERVISOR",=C2=A0 =C2=A0 "service_name" : "STORM"= =C2=A0 } ------=_Part_1702178_1518787780.1430974169180 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks Si= d.
I had trouble restarting the metrics colle= ctor.
Somehow it was complaining about not be= ing able to connect to the HBase embedded zookeeper on port 61181.
Anyway, after a few= tries I was able to bring it up.
I had broug= ht down the whole metrics collection system - and just completed a rolling = restart of the metrics collector.

After all the abo= ve I am back to the same reproducible situation of no Kafka metrics.=

=
However I examined the servers on the cluster and apparen= tly there is not sufficient free RAM - or should I say a lot of it is used = by filesystem buffer/caching (15 GB or 30%). This is not surprising as both= Flume and Kafka are heavy on (sequential) I/O.

I will need to "resolve" this situation before I can= increase memory for HBase.

But all the same, thanks for the pointers - this has give me enough things= to look into.

[root@dtord01flm01p ~]# free -m
     = ;        total       used     =   free     shared    buffers     cached<= /div>
Me= m:         48251      47657    = ;    593         32       &nbs= p;  4      32417
-/+ buffers/cache:      1= 5235      33015
Swap:         8191         &n= bsp;0       8191

<= div id=3D"yui_3_16_0_1_1430921706662_119364">

<= div style=3D"font-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial, = Lucida Grande, sans-serif; font-size: 16px;" id=3D"yui_3_16_0_1_14309217066= 62_119197">

From: Siddharth Wagle <swagle@hor= tonworks.com>
To: "= user@ambari.apache.org" <user@ambari.apache.org>; "user@ambari.apache= .org" <user@ambari.apache.org>
Sent: Wednesday, May 6, 2015 11:31 PM
Subject: Re: Kafka broker metrics not appea= ring in REST API

Is this or can this be writing to its own disk? 

Can you look at the hbase regionserver web ui, in your browser key in = the http://metric-collrctor-host:61310.

This the hbase master info port.
Click on the link to regionse= rver. Look at Queues and the block cache stats.

Queues should be empty for ma= jority of the time, lets say when u refresh page a few times. If not, this = points to disk io bottleneck.

Cache hit ratio as i have obs= erved this is around 70%, more the better.

Do you have available physica= l memory on that box? If yes, the hbase master heap size in ams-hbase-env a= nd the metric collector heap size in ams-env should be bumped up. Default i= s 512m in both cases.






Here's the embedded= HBase data dir size.

[jthakrar@dtord01flm03p data]$ pwd
/localpart0/ambari-metrics-collector/hbase/data

[jthakrar@dtord01flm03p data]$ du -xhs *
14G     default
72K     hbase





From: Siddharth = Wagle <swagle@hortonworks.com>
To: "j_thakrar@yahoo.com" &= lt;j_thakrar@yahoo.com>; "user@ambari.apache.org" <user@ambari.apache= .org>
Sent: Wednesday, May 6, 201= 5 11:00 PM
Subject: Re: Kafka broker m= etrics not appearing in REST API

We have tested t= he embeded mode to work with upto 400 node cluster and multiple services ru= nning on it.

You can change the hbase.rootdir in ams-hbase-site and possibly write = to partition with separate disk mount.

And copy over the data from existing location. It would be good to kno= w what is the size of data written to hbase.rootdir to get an idea of what = kind of write volume are we looking at. 

Sid


Sent by Outlook for Android





On Wed, May 6, 2015 at 8:52 PM -070= 0, "Jayesh Thakrar" <= j_thakrar@yahoo.com> wrote:

We have a 30-node cluster.
Unfortunately, th= is is also our production cluster and there's no HDFS as it is a dedicated = Flume cluster.
We have installed= Ambari + Storm + Kafka (HDP) on a cluster on which we have production data= being flumed.
The flume data is= being sent to an HDFS cluster which is a little overloaded, so we want to = send flume data to Kafka and then "throttle" the data being loaded into the= HDFS cluster.

But y= ou have given me an idea - maybe I can setup a new HBase file location so t= hat I can do away with HBase data corruption, if any.

It will take me some time to do that, will let you know on= ce I have tried it out.

Thank= s,
jayes= h



From: Siddharth = Wagle <swagle@hortonworks.com>
To: "user@ambari.apache.org= " <user@ambari.apache.org>; Jayesh Thakrar <j_thakrar@yahoo.com>= ;
Sent: Wednesday, May 6, 201= 5 10:42 PM
Subject: Re: Kafka broker m= etrics not appearing in REST API

How big is your cluster in terms of number of nodes?
You can tune settings for HBase based on cluster size.

Following are the instructions for writing metrics to HDFS instead of = local FS.

ams-site:::
timeline.metrics.service.operation.mode =3D distributed

ams-hbase-site:::
hbase.rootdir =3D hdfs://<namenode-host>:8020/amshbase
hbase.cluster.distributed =3D true

-Sid




From: Jayesh = Thakrar <j_thakrar@yahoo.com>
Sent: Wednesday, May 06, 2015 8:30 PM
To: user@ambari.apache.org; Siddharth Wagle; Jayesh Thakrar
Subject: Re: Kafka broker metrics not appearing in REST API
 
More info..= ..

I was= doing some "stress-testing" and interestingly, the Metrics Collector crash= ed 2 times and I had to restart it (don't like a file-based HBase for the m= etrics collector, but not very confident of configuring the system to point to an existing HBase cluster).

Also, after this = email thread, I looked up  the metrics collector logs and see errors l= ike this -

METRIC_RECORD' at region=3DMETRIC_RECORD,,1429966316307.947cfa22f884d035c09= fe804b1f5402c., hostname=3Ddtord01flm03p.dc.dotomi.net,60455,1429737430103,= seqNum=3D243930
13:09:37,619  INFO [phoenix-1-thread-349921] RpcRetryingCaller:129 - C= all exception, tries=3D11, retries=3D35, started=3D835564 ms ago, cancelled= =3Dfalse, msg=3Drow 'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1= MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL=C3=AF=C2=BF=C2=BD=C3=AF=C2= =BF=C2=BD:=C3=AF=C2=BF=C2=BDkafka_broker' on table 'METRIC_RECORD' at region=3DMETRIC_RECORD,kafkark.RequestMetrics.= Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\= x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa47= 5dd45., hostname=3Ddtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=3D243931
13:10:58,082  INFO [phoenix-1-thread-349920] RpcRetryingCaller:129 - C= all exception, tries=3D12, retries=3D35, started=3D916027 ms ago, cancelled= =3Dfalse, msg=3Drow '' on table 'METRIC_RECORD' at region=3DMETRIC_RECORD,,= 1429966316307.947cfa22f884d035c09fe804b1f5402c., hostname=3Ddtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=3D24393= 0
13:10:58,082  INFO [phoenix-1-thread-349921] RpcRetryingCaller:129 - C= all exception, tries=3D12, retries=3D35, started=3D916027 ms ago, cancelled= =3Dfalse, msg=3Drow 'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1= MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL=C3=AF=C2=BF=C2=BD=C3=AF=C2= =BF=C2=BD:=C3=AF=C2=BF=C2=BDkafka_broker' on table 'METRIC_RECORD' at region=3DMETRIC_RECORD,kafkark.RequestMetrics.= Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\= x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa47= 5dd45., hostname=3Ddtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=3D243931
13:10:58,112 ERROR [Thread-25] Time= lineMetricAggregator:221 - Exception during aggregating metrics.
org.apache.phoenix.exception.Phoeni= xIOException: org.apache.phoenix.exception.PhoenixIOException: Failed after= attempts=3D36, exceptions:
Sat Apr 25 13:10:58 UTC 2015, null,= java.net.SocketTimeoutException: callTimeout=3D900000, callDuration=3D9380= 97: row '' on table 'METRIC_RECORD' at region=3DMETRIC_RECORD,,142996631630= 7.947cfa22f884d035c09fe804b1f5402c., hostname=3Ddtord01flm03p.dc.dotomi.net= ,60455,1429737430103, seqNum=3D243930

        at org.= apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:107)
        at org.= apache.phoenix.iterate.ParallelIterators.getIterators(ParallelIterators.jav= a:527)
        at org.= apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResult= Iterator.java:48)
        at org.= apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultI= terator.java:63)
        at org.= apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator= .java:90)
        at org.= apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResult= Iterator.java:87)
        at org.= apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739)
        at org.apache.hadoop.yarn.server.applicationhis= toryservice.metrics.timeline.TimelineMetricAggregator.aggregateMetricsFromR= esultSet(TimelineMetricAggregator.java:104)
        at org.= apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.Timeli= neMetricAggregator.aggregate(TimelineMetricAggregator.java:72)
        at org.= apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.Abstra= ctTimelineAggregator.doWork(AbstractTimelineAggregator.java:217)
        at org.= apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.Abstra= ctTimelineAggregator.runOnce(AbstractTimelineAggregator.java:94)
        at org.= apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.Abstra= ctTimelineAggregator.run(AbstractTimelineAggregator.java:70)




From:<= /span> Jayesh Thakrar <j_thakrar@yahoo.com>
To: Siddharth Wagle <swa= gle@hortonworks.com>; "user@ambari.apache.org" <user@ambari.apache.or= g>
Sent: Wednesday, May 6, 201= 5 10:07 PM
Subject: Re: Kafka broker m= etrics not appearing in REST API

Hi Siddhart= h,

Yes, I am using&n= bsp;Ambari 2.0 with Ambari Metrics service.
The interesting thing is that I g= ot them for some time and not anymore.
And I also know that the metrics = are being collected since i can see them on the dashboard.
Any pointer= for troubleshooting?

And btw, it would be nice to have a count of messages received a= nd not a computed metric count / min.
TSDB does a good job of giving me= cumulative and rate-per-sec graphs and numbers.

Thanks in advance= ,
Jayesh




From: Siddharth W= agle <swagle@hortonworks.com>
To: "user@ambari.apache.org= " <user@ambari.apache.org>; Jayesh Thakrar <j_thakrar@yahoo.com>= ;
Sent: Wednesday, May 6, 201= 5 10:03 PM
Subject: Re: Kafka broker m= etrics not appearing in REST API

Hi Jayesh,

Are you using Amb= ari 2.0 with Ambari Metrics service?

BR,
Sid



From: Jayesh = Thakrar <j_thakrar@yahoo.com>
Sent: Wednesday, May 06, 2015 7:53 PM
To: user@ambari.apache.org
Subject: Kafka broker metrics not appearing in REST API
 
Hi,

I have installed = 2 clusters with Ambari and Storm and Kafka.
After= the install, I was able to get metrics for both Storm and Kafka via REST A= PI.
This worked fine = for a week, but since the past 2 days, I have not been getting Kafka metric= s.

I need the metric= s to push to an OpenTSDB cluster.
I do = get host metrics and Nimbus metrics but not KAFKA_BROKER metrics.

I did have mainte= nance turned on for some time, but maintenance is turned off now.

[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p= :8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=3Dmetri= cs'
{
  "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_= prod/components/NIMBUS?fields=3Dmetrics",
  "ServiceComponentInfo" : {
    "cluster_name" : "ord_flume_kafka_prod",
    "component_name" : "NIMBUS",
    "service_name" : "STORM"
  },
  "metrics" : {
    "storm" : {
      "nimbus" : {
        "freeslots" : 54.0,
        "supervisors" : 27.0,
        "topologies" : 0.0,
        "totalexecutors" : 0.0,
        "totalslots" : 54.0,
        "totaltasks" : 0.0,
        "usedslots" : 0.0
      }
    }
  }
}

[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p= :8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields= =3Dmetrics'
{
  "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_= prod/components/KAFKA_BROKER?fields=3Dmetrics",
  "ServiceComponentInfo" : {
    "cluster_name" : "ord_flume_kafka_prod",
    "component_name" : "KAFKA_BROKER",
    "service_name" : "KAFKA"
  }
}

[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p= :8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=3Dm= etrics'
{
  "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_= prod/components/SUPERVISOR?fields=3Dmetrics",
  "ServiceComponentInfo" : {
    "cluster_name" : "ord_flume_kafka_prod",
    "component_name" : "SUPERVISOR",
    "service_name" : "STORM"
  }











------=_Part_1702178_1518787780.1430974169180--