ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AMBARI-16913) Web Client Requests Handled By Jetty Should Not Be Blocked By JMX Property Providers
Date Thu, 26 May 2016 20:13:13 GMT
Jonathan Hurley created AMBARI-16913:
----------------------------------------

             Summary: Web Client Requests Handled By Jetty Should Not Be Blocked By JMX Property
Providers
                 Key: AMBARI-16913
                 URL: https://issues.apache.org/jira/browse/AMBARI-16913
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.0.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Blocker
             Fix For: 2.4.0


Incoming requests from the web client (or from any REST API) will eventually be routed to
the property provider / subresource framework. It is here were any JMX data is queried for
within the context of the REST request. In large clusters, these requests can backup quite
easily (even with a massive threadpool), causing UX degradations in the web client:

{code}
Thread [qtp-ambari-client-38]
	JMXPropertyProvider(ThreadPoolEnabledPropertyProvider).populateResources(Set<Resource>,
Request, Predicate) line: 168	
	JMXPropertyProvider.populateResources(Set<Resource>, Request, Predicate) line: 156

	StackDefinedPropertyProvider.populateResources(Set<Resource>, Request, Predicate) line:
200	
	ClusterControllerImpl.populateResources(Type, Set<Resource>, Request, Predicate) line:
155	
	QueryImpl.queryForResources() line: 407	
	QueryImpl.execute() line: 217	
	ReadHandler.handleRequest(Request) line: 69	
	GetRequest(BaseRequest).process() line: 145	
{code}

Consider one of the calls made by the web client:
{code}
GET api/v1/clusters/c1/components/?
ServiceComponentInfo/category=MASTER&
fields=
ServiceComponentInfo/service_name,
host_components/HostRoles/display_name,
host_components/HostRoles/host_name,
host_components/HostRoles/state,
host_components/HostRoles/maintenance_state,
host_components/HostRoles/stale_configs,
host_components/HostRoles/ha_state,
host_components/HostRoles/desired_admin_state,
host_components/metrics/jvm/memHeapUsedM,
host_components/metrics/jvm/HeapMemoryMax,
host_components/metrics/jvm/HeapMemoryUsed,
host_components/metrics/jvm/memHeapCommittedM,
host_components/metrics/mapred/jobtracker/trackers_decommissioned,
host_components/metrics/cpu/cpu_wio,
host_components/metrics/rpc/client/RpcQueueTime_avg_time,
host_components/metrics/dfs/FSNamesystem/*,
host_components/metrics/dfs/namenode/Version,
host_components/metrics/dfs/namenode/LiveNodes,
host_components/metrics/dfs/namenode/DeadNodes,
host_components/metrics/dfs/namenode/DecomNodes,
host_components/metrics/dfs/namenode/TotalFiles,
host_components/metrics/dfs/namenode/UpgradeFinalized,
host_components/metrics/dfs/namenode/Safemode,
host_components/metrics/runtime/StartTime
{code}

This query is essentially saying that for every {{MASTER}}, get metrics from them. The problem
is that in a large cluster, there could be 100 masters, yet the metrics being asked for are
only for NameNode. As a result, the JMX endpoints for all 100 masters are queried - *live*
- as part of the request.

There are two inherent flaws with this approach:

- Even with millisecond JMX response times, multiplying this by 100's and then adding parsing
overhead causes a noticeable delay in the web client as the federated requests are blocking
the main UX request

- Although there is a threadpool which scales up to service these requests - that only really
works for 1 user. With multiple users logged in, you'd need 100's upon 100's of threads pulling
in the same JMX data

This data should never be queried for directly as part of the incoming REST requests. Instead,
an autonomous pool of threads should be constantly retrieving these point-in-time metrics
and updating a cache. The cache is then used to service all live REST requests. 
- On the first request to a resource, a cache miss occurs and no data is returned. I think
this is acceptable since metrics take a few moments to populate anyway right now. As the web
client polls, the next request should pickup the newly cached metrics.
- Only URLs which are being asked for by incoming REST requests should be considered for retrieval.
After sometime, if they haven't been requested, then the headless threadpool can stop trying
to update their data
- All JMX data will be parsed and stored in-memory, in an expiring cache



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message