ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Wagle <swa...@hortonworks.com>
Subject Re: Kafka broker metrics not appearing in REST API
Date Thu, 07 May 2015 16:45:52 GMT
What is the memory setting in amb-hbase-env for master and regionserver?


Could you post

/var/log/ambari-metrics-collector/hbase-ams-master-enhanced-dashboard-3.log

and

/var/log/ambari-metrics-collector/ambari-metrics-collector.log

on this thread?


BR,

Sid


________________________________
From: Jayesh Thakrar <j_thakrar@yahoo.com>
Sent: Thursday, May 07, 2015 8:33 AM
To: user@ambari.apache.org; Jayesh Thakrar
Subject: Re: Kafka broker metrics not appearing in REST API

Hi Harsha and Sid,

So I was able to increase the memory for the metrics-collector and for the hbase master from
512 MB to 1500 MB.
The metrics collector then restarted without any issues (I think that may have fixed any instability
I had earlier).
Immediately after the restart, I was able to get Kafka metrics a few times - and then they
disappeared again.

Can you point me to anything to the codepath/logs that is involved with satisfying the REST
API for components?
I looked at the logs in /var/log/ambari-server/ambari-server.log, but there was nothing that
can help with this issue.

Greatly appreciate your help,
Thanks,
Jayesh


________________________________
From: Jayesh Thakrar <j_thakrar@yahoo.com>
To: "user@ambari.apache.org" <user@ambari.apache.org>
Sent: Wednesday, May 6, 2015 11:49 PM
Subject: Re: Kafka broker metrics not appearing in REST API

Thanks Sid.
I had trouble restarting the metrics collector.
Somehow it was complaining about not being able to connect to the HBase embedded zookeeper
on port 61181.
Anyway, after a few tries I was able to bring it up.
I had brought down the whole metrics collection system - and just completed a rolling restart
of the metrics collector.

After all the above I am back to the same reproducible situation of no Kafka metrics.

However I examined the servers on the cluster and apparently there is not sufficient free
RAM - or should I say a lot of it is used by filesystem buffer/caching (15 GB or 30%). This
is not surprising as both Flume and Kafka are heavy on (sequential) I/O.

I will need to "resolve" this situation before I can increase memory for HBase.

But all the same, thanks for the pointers - this has give me enough things to look into.

[root@dtord01flm01p ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         48251      47657        593         32          4      32417
-/+ buffers/cache:      15235      33015
Swap:         8191          0       8191





________________________________
From: Siddharth Wagle <swagle@hortonworks.com>
To: "user@ambari.apache.org" <user@ambari.apache.org>; "user@ambari.apache.org" <user@ambari.apache.org>
Sent: Wednesday, May 6, 2015 11:31 PM
Subject: Re: Kafka broker metrics not appearing in REST API

Is this or can this be writing to its own disk?

Can you look at the hbase regionserver web ui, in your browser key in the http://metric-collrctor-host:61310.

This the hbase master info port.
Click on the link to regionserver. Look at Queues and the block cache stats.

Queues should be empty for majority of the time, lets say when u refresh page a few times.
If not, this points to disk io bottleneck.

Cache hit ratio as i have observed this is around 70%, more the better.

Do you have available physical memory on that box? If yes, the hbase master heap size in ams-hbase-env
and the metric collector heap size in ams-env should be bumped up. Default is 512m in both
cases.

Sid


Sent by Outlook<http://taps.io/outlookmobile> for Android





On Wed, May 6, 2015 at 9:16 PM -0700, "Jayesh Thakrar" <j_thakrar@yahoo.com<mailto:j_thakrar@yahoo.com>>
wrote:

Here's the embedded HBase data dir size.

[jthakrar@dtord01flm03p data]$ pwd
/localpart0/ambari-metrics-collector/hbase/data

[jthakrar@dtord01flm03p data]$ du -xhs *
14G     default
72K     hbase




________________________________
From: Siddharth Wagle <swagle@hortonworks.com>
To: "j_thakrar@yahoo.com" <j_thakrar@yahoo.com>; "user@ambari.apache.org" <user@ambari.apache.org>
Sent: Wednesday, May 6, 2015 11:00 PM
Subject: Re: Kafka broker metrics not appearing in REST API

We have tested the embeded mode to work with upto 400 node cluster and multiple services running
on it.

You can change the hbase.rootdir in ams-hbase-site and possibly write to partition with separate
disk mount.

And copy over the data from existing location. It would be good to know what is the size of
data written to hbase.rootdir to get an idea of what kind of write volume are we looking at.

Sid


Sent by Outlook<http://taps.io/outlookmobile> for Android





On Wed, May 6, 2015 at 8:52 PM -0700, "Jayesh Thakrar" <j_thakrar@yahoo.com<mailto:j_thakrar@yahoo.com>>
wrote:

We have a 30-node cluster.
Unfortunately, this is also our production cluster and there's no HDFS as it is a dedicated
Flume cluster.
We have installed Ambari + Storm + Kafka (HDP) on a cluster on which we have production data
being flumed.
The flume data is being sent to an HDFS cluster which is a little overloaded, so we want to
send flume data to Kafka and then "throttle" the data being loaded into the HDFS cluster.

But you have given me an idea - maybe I can setup a new HBase file location so that I can
do away with HBase data corruption, if any.

It will take me some time to do that, will let you know once I have tried it out.

Thanks,
jayesh


________________________________
From: Siddharth Wagle <swagle@hortonworks.com>
To: "user@ambari.apache.org" <user@ambari.apache.org>; Jayesh Thakrar <j_thakrar@yahoo.com>
Sent: Wednesday, May 6, 2015 10:42 PM
Subject: Re: Kafka broker metrics not appearing in REST API

How big is your cluster in terms of number of nodes?
You can tune settings for HBase based on cluster size.

Following are the instructions for writing metrics to HDFS instead of local FS.

ams-site:::
timeline.metrics.service.operation.mode = distributed

ams-hbase-site:::
hbase.rootdir = hdfs://<namenode-host>:8020/amshbase
hbase.cluster.distributed = true

-Sid



________________________________
From: Jayesh Thakrar <j_thakrar@yahoo.com>
Sent: Wednesday, May 06, 2015 8:30 PM
To: user@ambari.apache.org; Siddharth Wagle; Jayesh Thakrar
Subject: Re: Kafka broker metrics not appearing in REST API

More info....

I was doing some "stress-testing" and interestingly, the Metrics Collector crashed 2 times
and I had to restart it (don't like a file-based HBase for the metrics collector, but not
very confident of configuring the system to point to an existing HBase cluster).

Also, after this email thread, I looked up  the metrics collector logs and see errors like
this -

METRIC_RECORD' at region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c., hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103,
seqNum=243930
13:09:37,619  INFO [phoenix-1-thread-349921] RpcRetryingCaller:129 - Call exception, tries=11,
retries=35, started=835564 ms ago, cancelled=false, msg=row 'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL��:�kafka_broker'
on table 'METRIC_RECORD' at region=METRIC_RECORD,kafkark.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243931
13:10:58,082  INFO [phoenix-1-thread-349920] RpcRetryingCaller:129 - Call exception, tries=12,
retries=35, started=916027 ms ago, cancelled=false, msg=row '' on table 'METRIC_RECORD' at
region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c., hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103,
seqNum=243930
13:10:58,082  INFO [phoenix-1-thread-349921] RpcRetryingCaller:129 - Call exception, tries=12,
retries=35, started=916027 ms ago, cancelled=false, msg=row 'kafka.network.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate^@dtord01flm27p.dc.dotomi.net^@^@^@^AL��:�kafka_broker'
on table 'METRIC_RECORD' at region=METRIC_RECORD,kafkark.RequestMetrics.Metadata-RequestsPerSec.1MinuteRate\x00dtord01flm27p.dc.dotomi.net\x00\x00\x00\x01L\xED\xED:\xE5kafka_broker,1429966316307.d488f5e58d54c3251cb81fdfa475dd45.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243931
13:10:58,112 ERROR [Thread-25] TimelineMetricAggregator:221 - Exception during aggregating
metrics.
org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException:
Failed after attempts=36, exceptions:
Sat Apr 25 13:10:58 UTC 2015, null, java.net.SocketTimeoutException: callTimeout=900000, callDuration=938097:
row '' on table 'METRIC_RECORD' at region=METRIC_RECORD,,1429966316307.947cfa22f884d035c09fe804b1f5402c.,
hostname=dtord01flm03p.dc.dotomi.net,60455,1429737430103, seqNum=243930

        at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:107)
        at org.apache.phoenix.iterate.ParallelIterators.getIterators(ParallelIterators.java:527)
        at org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)
        at org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:63)
        at org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:90)
        at org.apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResultIterator.java:87)
        at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregateMetricsFromResultSet(TimelineMetricAggregator.java:104)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.TimelineMetricAggregator.aggregate(TimelineMetricAggregator.java:72)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.doWork(AbstractTimelineAggregator.java:217)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.runOnce(AbstractTimelineAggregator.java:94)
        at org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.AbstractTimelineAggregator.run(AbstractTimelineAggregator.java:70)



________________________________
From: Jayesh Thakrar <j_thakrar@yahoo.com>
To: Siddharth Wagle <swagle@hortonworks.com>; "user@ambari.apache.org" <user@ambari.apache.org>
Sent: Wednesday, May 6, 2015 10:07 PM
Subject: Re: Kafka broker metrics not appearing in REST API

Hi Siddharth,

Yes, I am using Ambari 2.0 with Ambari Metrics service.
The interesting thing is that I got them for some time and not anymore.
And I also know that the metrics are being collected since i can see them on the dashboard.
Any pointer for troubleshooting?

And btw, it would be nice to have a count of messages received and not a computed metric count
/ min.
TSDB does a good job of giving me cumulative and rate-per-sec graphs and numbers.

Thanks in advance,
Jayesh



________________________________
From: Siddharth Wagle <swagle@hortonworks.com>
To: "user@ambari.apache.org" <user@ambari.apache.org>; Jayesh Thakrar <j_thakrar@yahoo.com>
Sent: Wednesday, May 6, 2015 10:03 PM
Subject: Re: Kafka broker metrics not appearing in REST API

Hi Jayesh,

Are you using Ambari 2.0 with Ambari Metrics service?

BR,
Sid


________________________________
From: Jayesh Thakrar <j_thakrar@yahoo.com>
Sent: Wednesday, May 06, 2015 7:53 PM
To: user@ambari.apache.org
Subject: Kafka broker metrics not appearing in REST API

Hi,

I have installed 2 clusters with Ambari and Storm and Kafka.
After the install, I was able to get metrics for both Storm and Kafka via REST API.
This worked fine for a week, but since the past 2 days, I have not been getting Kafka metrics.

I need the metrics to push to an OpenTSDB cluster.
I do get host metrics and Nimbus metrics but not KAFKA_BROKER metrics.

I did have maintenance turned on for some time, but maintenance is turned off now.

[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=metrics'
{
  "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/NIMBUS?fields=metrics",
  "ServiceComponentInfo" : {
    "cluster_name" : "ord_flume_kafka_prod",
    "component_name" : "NIMBUS",
    "service_name" : "STORM"
  },
  "metrics" : {
    "storm" : {
      "nimbus" : {
        "freeslots" : 54.0,
        "supervisors" : 27.0,
        "topologies" : 0.0,
        "totalexecutors" : 0.0,
        "totalslots" : 54.0,
        "totaltasks" : 0.0,
        "usedslots" : 0.0
      }
    }
  }
}

[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields=metrics'
{
  "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/KAFKA_BROKER?fields=metrics",
  "ServiceComponentInfo" : {
    "cluster_name" : "ord_flume_kafka_prod",
    "component_name" : "KAFKA_BROKER",
    "service_name" : "KAFKA"
  }
}

[jthakrar@dtord01hdp0101d ~]$ curl --user admin:admin 'http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=metrics'
{
  "href" : "http://dtord01flm01p:8080/api/v1/clusters/ord_flume_kafka_prod/components/SUPERVISOR?fields=metrics",
  "ServiceComponentInfo" : {
    "cluster_name" : "ord_flume_kafka_prod",
    "component_name" : "SUPERVISOR",
    "service_name" : "STORM"
  }














Mime
View raw message