ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijaya Narayana Reddy Bhoomi Reddy <vijaya.bhoomire...@whishworks.com>
Subject Re: Issue with Ambari Metrics Collector - Distributed mode
Date Fri, 23 Oct 2015 15:15:07 GMT

Thanks Jonathan.

Regards
Vijay

> On 23 Oct 2015, at 16:10, Jonathan Hurley <jhurley@hortonworks.com> wrote:
> 
> First you need to get the ID of the alert definition in your system
> GET api/v1/clusters/<cluster>/alert_definitions?AlertDefinition/name=ambari_agent_disk_usage
> 
> Once you have the ID, you can do a PUT:
> PUT api/v1/clusters/<cluster>/alert_definitions/<id>
> 
> {
>   "AlertDefinition" : {
>     "source" : {
>       "parameters" : [
>         {
>           "name" : "minimum.free.space",
>           "display_name" : "Minimum Free Space",
>           "units" : "bytes",
>           "value" : 5.0E9,
>           "description" : "The overall amount of free disk space left before an alert
is triggered.",
>           "type" : "NUMERIC"
>         },
>         {
>           "name" : "percent.used.space.warning.threshold",
>           "display_name" : "Warning",
>           "units" : "%",
>           "value" : 0.8,
>           "description" : "The percent of disk space consumed before a warning is triggered.",
>           "type" : "PERCENT"
>         },
>         {
>           "name" : "percent.free.space.critical.threshold",
>           "display_name" : "Critical",
>           "units" : "%",
>           "value" : 0.9,
>           "description" : "The percent of disk space consumed before a critical alert
is triggered.",
>           "type" : "PERCENT"
>         }
>       ],
>       "path" : "alert_disk_space.py",
>       "type" : "SCRIPT"
>     }
>   }
> }
> 
> This changes the thresholds to 80% for warning and 90% for critical
> 
>> On Oct 23, 2015, at 10:45 AM, Vijaya Narayana Reddy Bhoomi Reddy <vijaya.bhoomireddy@whishworks.com
<mailto:vijaya.bhoomireddy@whishworks.com>> wrote:
>> 
>> Thanks Jonathan for your reply.
>> 
>> Can you please let me know the API version modifying the threshold values?
>> 
>> Regards
>> Vijay
>> 
>> 
>>> On 23 Oct 2015, at 15:24, Jonathan Hurley <jhurley@hortonworks.com <mailto:jhurley@hortonworks.com>>
wrote:
>>> 
>>> The ambari disk usage alerts are meant to check two things: that you have have
enough space total and percent free space in /usr/hdp for data created by hadoop and for installing
versioned RPMs. Total free space alerts are something that you’ll probably want to fix since
it means you have less than a certain amount of total free space left.
>>> 
>>> It seems like you’re talking about percent free space. Those can be changed
via the thresholds that the script uses. You can’t do this through the Ambari Web Client.
You have two options:
>>> 
>>> - Use the Ambari APIs to adjust the threshold values - this command is rather
long; let me know if you want to try this and I can paste the code to do it.
>>> 
>>> - Edit the script directly and set the defaults to higher limits: https://github.com/apache/ambari/blob/branch-2.1/ambari-server/src/main/resources/host_scripts/alert_disk_space.py#L36-L37
<https://github.com/apache/ambari/blob/branch-2.1/ambari-server/src/main/resources/host_scripts/alert_disk_space.py#L36-L37>
>>> 
>>> 
>>>> On Oct 23, 2015, at 9:26 AM, Vijaya Narayana Reddy Bhoomi Reddy <vijaya.bhoomireddy@whishworks.com
<mailto:vijaya.bhoomireddy@whishworks.com>> wrote:
>>>> 
>>>> 
>>>> Siddharth,
>>>> 
>>>> Thanks for your response. As ours was a 4 node cluster, I changed it to Embedded
mode from distributed mode and is working fine. However, I am facing another issue with regards
to Ambari agent disk usage alerts. Earlier, I had three alerts for three machines where /usr/hdp
is utilised more than 50%.
>>>> 
>>>> Initially when I setup the cluster, I had multiple mount points listed under
yarn.nodemanager.local-dirs and yarn.nodemaneger.log-dirs. /usr/hdp was one amor them Later,
I changed these values such that only one value is present for these (/export/hadoop/yarn/local
and /export/hadoop/yarn/log) and restarted the required components.
>>>> 
>>>> However, I am still seeing the Ambari disk usage alert for /usr/hdp. Can
you please let me know how to get rid of these alerts?
>>>> 
>>>> Thanks 
>>>> Vijay
>>>> 
>>>> 
>>>>> On 22 Oct 2015, at 19:02, Siddharth Wagle <swagle@hortonworks.com
<mailto:swagle@hortonworks.com>> wrote:
>>>>> 
>>>>> Hi Vijaya,
>>>>> 
>>>>> Please make all of the configs are accurate. (https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode
<https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode>)
>>>>> 
>>>>> Can you attach, your ams-site.xml and /etc/ams-hbase/conf/hbase-site.xml
?
>>>>> 
>>>>> - Sid
>>>>> 
>>>>> ________________________________________
>>>>> From: Vijaya Narayana Reddy Bhoomi Reddy <vijaya.bhoomireddy@whishworks.com
<mailto:vijaya.bhoomireddy@whishworks.com>>
>>>>> Sent: Thursday, October 22, 2015 8:36 AM
>>>>> To: user@ambari.apache.org <mailto:user@ambari.apache.org>
>>>>> Subject: Issue with Ambari Metrics Collector - Distributed mode
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am facing an issue while setting up Ambari Metrics in distributed mode.
I am setting up HDP 2.3.x using Ambari 2.1.x. Initially when I was setting up the cluster,
I was shown a warning message that the volume / directory for metrics service  is same as
the one used by datanode and hence I was recommended to change it. So I went ahead and pointed
it to hdfs, trying to setting up metrics service in distributed mode.
>>>>> 
>>>>> However, Ambari Metrics service is not set up properly and it timed out
while setting up the cluster, showing a warning that Ambari Metrics service hasn’t started.
I restarted the Metrics collector service multiple times, but it would stop again in a few
seconds.
>>>>> 
>>>>> On further observation, I realised that in the ams-site.xml file, timeline.metrics.service.operation.mode
was still pointing to “embedded", where as hbase-site.xml had all the required properties
set correctly. So I changed the timeline.metrics.service.operation.mode property to “distributed”
and restarted the required services as recommended by Ambari. However, the restart process
is stuck at 68% and eventually timed out. Its not able to restart the Metrics Collector service.
However, all the metrics monitor services are re-started without any issues.
>>>>> 
>>>>> Can anyone please throw light on why this happening and what is the solution
to fix this?
>>>>> 
>>>>> Thanks
>>>>> Vijay
>>>>> --
>>>>> The contents of this e-mail are confidential and for the exclusive use
of
>>>>> the intended recipient. If you receive this e-mail in error please delete
>>>>> it from your system immediately and notify us either by e-mail or
>>>>> telephone. You should not copy, forward or otherwise disclose the content
>>>>> of the e-mail. The views expressed in this communication may not
>>>>> necessarily be the view held by WHISHWORKS.
>>>>> 
>>>> 
>>>> 
>>>> The contents of this e-mail are confidential and for the exclusive use of
the intended recipient. If you receive this e-mail in error please delete it from your system
immediately and notify us either by e-mail or telephone. You should not copy, forward or otherwise
disclose the content of the e-mail. The views expressed in this communication may not necessarily
be the view held by WHISHWORKS.
>>> 
>> 
>> 
>> The contents of this e-mail are confidential and for the exclusive use of the intended
recipient. If you receive this e-mail in error please delete it from your system immediately
and notify us either by e-mail or telephone. You should not copy, forward or otherwise disclose
the content of the e-mail. The views expressed in this communication may not necessarily be
the view held by WHISHWORKS.
> 


-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.

Mime
View raw message