ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley" <jhur...@hortonworks.com>
Subject Re: Review Request 35945: Memory Exhausted During Upgrade Of Large Cluster
Date Sat, 27 Jun 2015 02:34:08 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35945/
-----------------------------------------------------------

(Updated June 26, 2015, 10:34 p.m.)


Review request for Ambari, Mahadev Konar, Nate Cole, Sumit Mohanty, and Tom Beerbower.


Bugs: AMBARI-12178
    https://issues.apache.org/jira/browse/AMBARI-12178


Repository: ambari


Description (updated)
-------

During an upgrade of a large cluster, the memory used by Ambari grows until it is fully consumed.
This, however, only happens when the Upgrade Dialog page is open. If that popup is closed,
the memory usage stays relatively constant. Based on heap dumps, the largest offenders are
StageEnity and, as a result, byte[]

Long story short here is that we have a cache in ActionDBAccessorImpl that holds onto entities.
Because of this, the underlying UnitOfWork map is never released and holds onto all StageEntity
instances. Eventually, items are purged from this cache, but it's not fast enough to free
up memory resources.

Without ripping apart Ambari or making dangerous cache reference changes, the easiest solution
was to ensure that the fields causing the StageEntity to be large were lazy loaded since most
of the time these entities are just sitting around in the EntityManager.

Before change
```
Class Name                                       | Objects | Shallow Heap | Retained Heap
------------------------------------------------------------------------------------------
org.apache.ambari.server.orm.entities.StageEntity| 292,356 |   18,466,176 | 3,575,693,136
------------------------------------------------------------------------------------------
```

First patch
```
Class Name                                       | Objects | Shallow Heap | Retained Heap
------------------------------------------------------------------------------------------
org.apache.ambari.server.orm.entities.StageEntity| 193,715 |   15,716,640 |   255,318,392
------------------------------------------------------------------------------------------
```

Second patch
```
Class Name                                       | Objects | Shallow Heap | Retained Heap
------------------------------------------------------------------------------------------
org.apache.ambari.server.orm.entities.StageEntity|   4,410 |      423,360 |    61,598,736
------------------------------------------------------------------------------------------
```


Diffs
-----

  ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
20ec9ea 
  ambari-server/src/main/java/org/apache/ambari/server/controller/internal/StageResourceProvider.java
664fae3 
  ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeGroupResourceProvider.java
eb34d63 
  ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java b354841 
  ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostEntity.java 9f3f70c

  ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java c2b97d6

  ambari-server/src/main/java/org/apache/ambari/server/topology/HostRequest.java f63ba3f 
  ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java 31363b4


Diff: https://reviews.apache.org/r/35945/diff/


Testing
-------

Performed a cluster upgrade and monitored the memory usage. 200,000 StageEntity used to occupy
3.5GB of heap; now they take up about 150MB.

Tests run: 3099, Failures: 0, Errors: 0, Skipped: 28

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 24:13 min
[INFO] Finished at: 2015-06-26T21:32:27-04:00
[INFO] Final Memory: 46M/1414M
[INFO] ------------------------------------------------------------------------


Thanks,

Jonathan Hurley


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message