atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Mestry <ames...@hortonworks.com>
Subject Review Request 66184: Migration Utility: Branch 0.8: Performance Improvement
Date Tue, 20 Mar 2018 23:14:01 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66184/
-----------------------------------------------------------

Review request for atlas, Madhan Neethiraj, Ruchi Solani, and Sarath Subramanian.


Bugs: ATLAS-2461
    https://issues.apache.org/jira/browse/ATLAS-2461


Repository: atlas


Description
-------

**Background** 
The migration utility committed ealier has couple of short comings:
- Relies on Export service.
  - Needs _export-options.json_ to be specified.
  - Exporting everything means meticuloulsy updating the options file. It is likely some specification
is missed and hence will lead to less data being migrated. 
- Suffers from performance problems for large data sets.

**Approach**
The new approach uses _Titan's_ _GraphSON_ writer. This is configured to export all data in
_EXTENDED_ format.

The _EXTENDED_ format separates _vertices_ and _edges_. This open other interesting avenues
for import.

**Implementation**
- Modified _Exporter_ to use _AtlasTypeRegistry_ and _GraphSONWriter_.
- Produced files: 
   - _atlas-typedef.json_: Contains type definitions of all types.
   - _atlas-migration-data.json_: Contains data from the database.


Diffs
-----

  tools/atlas-migration-exporter/pom.xml 5c6c61ee 
  tools/atlas-migration-exporter/src/main/java/org/apache/atlas/migration/Exporter.java a9873df0



Diff: https://reviews.apache.org/r/66184/diff/1/


Testing
-------

**Functional tests**
Export from repositories with:
- Custom types.
- Complex lineages.
- Created hive entities via beeline.
- Imported data.

**Gremlin Shell**
- Used _Gremlin_ shell to perform export operation.


Thanks,

Ashutosh Mestry


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message