incubator-ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Beerbower (JIRA)" <>
Subject [jira] [Commented] (AMBARI-3013) Powering off RM node increases API latency by a factor of 6
Date Wed, 04 Sep 2013 18:38:52 GMT


Tom Beerbower commented on AMBARI-3013:

For solution 1, the request can never return faster than the timeout if the server is down.
 If we make the timeout too small then we risk timing out when we shouldn't.

I like solution 2. We should expose the heartbeat status for all hosts so that any provider
can make the check up front.  I think that we can assume if there is no heartbeat then any
request to the host will fail.
> Powering off RM node increases API latency by a factor of 6
> -----------------------------------------------------------
>                 Key: AMBARI-3013
>                 URL:
>             Project: Ambari
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 1.4.0
>            Reporter: Srimanth Gunturi
>            Assignee: Mahadev konar
>              Labels: perfomance
>             Fix For: 1.4.0
>         Attachments: Response Time Graph_conn_timeout1000.png, Response Time Graph_conn_timeout5000.png,
> On a 4 node cluster I was testing the below API call.
> {noformat}
> /api/v1/clusters/${cluster}/services?fields=components/ServiceComponentInfo,components/host_components,components/host_components/HostRoles,components/host_components/metrics/jvm/memHeapUsedM,components/host_components/metrics/jvm/memHeapCommittedM,components/host_components/metrics/mapred/jobtracker/trackers_decommissioned,components/host_components/metrics/cpu/cpu_wio,components/host_components/metrics/rpc/RpcQueueTime_avg_time,components/host_components/metrics/flume/flume,components/host_components/metrics/yarn/Queue
> {noformat}
> When everything was working the latency was ~500ms. 
> I then powered off the RM node, and immediately the call latency spiked by 30 times (~15000ms)
. After some time, it reduced, but still was 6 times the original latency (~3000ms). When
the machine came back online, the call again fell back to its original ~500ms latency.
> Images attached.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message