atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Mestry <ames...@hortonworks.com>
Subject Review Request 57495: Export API: Memory usage optimization
Date Fri, 10 Mar 2017 04:09:15 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57495/
-----------------------------------------------------------

Review request for atlas, Madhan Neethiraj and Sarath Subramanian.


Bugs: ATLAS-1646
    https://issues.apache.org/jira/browse/ATLAS-1646


Repository: atlas


Description
-------

**Background**
Existing implementation of Export REST API uses *ByteArrayOutputStream* to during output zip
file creation. This puts pressure on memory when handling large data. Also, the data transfer
does not start until entire export is done. This situation is less than ideal for performance.

**Solution**
- Passing *ServletOutputStream* to *ZipSink*.
  - This improves memory usage as memory does not get held up by *ByteArrayOutputStream*.

  - Reduces additional copy from *ByteArrayOutputStream* to *ServletOutputSream*.
  - Simplifies *ZipSink*.
- Clear internal data structures after operation completion.
  - This aids, though not much, when freeing up memory used. There is some improvement in
large transfers.
- *ExportService.ExportContext.guidsToProcess* removed sequential lookup from *List* to *Set*.
- Data transfer from server to client starts much sooner. Client is able to interrupt the
progress if needed.


Diffs
-----

  intg/src/main/java/org/apache/atlas/model/impexp/AtlasExportResult.java e6a967e 
  webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java 31a4cf9 
  webapp/src/main/java/org/apache/atlas/web/resources/ExportService.java c1891e0 
  webapp/src/main/java/org/apache/atlas/web/resources/ZipSink.java 2e4cb01 


Diff: https://reviews.apache.org/r/57495/diff/1/


Testing
-------

Profiled using *jmap* & *Eclipse MAT*, verified using *YourKit*.

Verified: *FetchTypes* viz. *full* and *connected*.

Memory usage: Stays constant on prolonged use. Verified ~3 hrs of continuous runs using medium
and large database exports.

Performance improvement:
Date | File Size | No. of Entities | Duration (in mins)|
-----|-----------|-----------------|-------------------|
3/08 |   180 MB  |          202930 |            22 mins|
3/09 |   180 MB  |          202930 |            19 mins|

About 15% improvement.


Thanks,

Ashutosh Mestry


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message