atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Mestry <ames...@hortonworks.com>
Subject Re: Review Request 66253: Migration: GraphSON-based Import
Date Wed, 04 Apr 2018 04:23:42 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66253/
-----------------------------------------------------------

(Updated April 4, 2018, 4:23 a.m.)


Review request for atlas, Apoorv Naik, Madhan Neethiraj, Ruchi Solani, Sarath Subramanian,
and Vishal Suvagia.


Changes
-------

Updates include: Metrics from large import.


Bugs: ATLAS-2460
    https://issues.apache.org/jira/browse/ATLAS-2460


Repository: atlas


Description
-------

**Background**
This implementation deals with the 'import into' part of the data migration process. 

It assumes:
- Export from older cluster is done.
- Generated file has been moved to newer cluster.

**Implementation**

During _Atlas_ server startup, the configuration parameter is checked, if that parameter exists,
all services except _DataMigrationService_ is started. Migration is started. Atlas server
is available in _MIGRATION_ mode. It processes REST calls made only to the _AdminResource_.

Here's are the udpates:
- New configuration parameter has been added:
    _atlas.migration.mode.filename=<name of the file to be imported>_
  This configuration parameter is set by Ambari as part of its migration orchestration. 
- _DataMigrationService_: New service that performs async migration as soon as Atlas server
starts up.
- _MigrationProgressService_: Added. Get progress of import.
- _AdminResource.getStatus()_ Now supplies additional status about migration.
- _ServiceState_ Modified to carry additional status _MIGRATION_. This status is set by looking
at the configuration parameter above.
- _Services_ modified for special handling of _DataMigrationService_.


**CURL**
Check status using:
```
curl -X GET -u admin:admin -H "Content-Type: application/json" -H "Cache-Control: no-cache"
http://localhost:21000/api/atlas/admin/status
```

**Migration Status**
The above URL in migration mode yields JSON like:
```
{"Status":"MIGRATION","MigrationStatus":{"operationStatus":"SUCCESS","startTime":1521738357947,"endTime":1521738359272,"currentIndex":48544}}
```


Diffs
-----

  common/src/main/java/org/apache/atlas/AtlasConstants.java f5de1df3 
  common/src/main/java/org/apache/atlas/repository/Constants.java 310dddb4 
  common/src/main/java/org/apache/atlas/service/Services.java 1267dc92 
  graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraph.java 31d20855 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java
6820a93c 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraphDatabase.java
a0060200 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/AtlasGraphSONReader.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/GraphSONTokensTP2.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/GraphSONUtility.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/JsonNodeParsers.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/JsonNodeProcessManager.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/MappedElementCache.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/PostProcessManager.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/ReaderStatusManager.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/RelationshipTypeCache.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/pc/WorkItemBuilder.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/pc/WorkItemConsumer.java
PRE-CREATION 
  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/migration/pc/WorkItemManager.java
PRE-CREATION 
  graphdb/titan0/src/main/java/org/apache/atlas/repository/graphdb/titan0/Titan0Graph.java
44090097 
  intg/src/main/java/org/apache/atlas/model/impexp/AtlasExportResult.java 1ea961d8 
  intg/src/main/java/org/apache/atlas/model/impexp/MigrationStatus.java PRE-CREATION 
  intg/src/main/java/org/apache/atlas/store/AtlasTypeDefStore.java c63dc24a 
  pom.xml bfbb9535 
  repository/pom.xml b1d6b1f9 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
5672d9dc 
  repository/src/main/java/org/apache/atlas/repository/impexp/MigrationProgressService.java
PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/migration/DataMigrationService.java
PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/migration/RelationshipCacheGenerator.java
PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/bootstrap/AtlasTypeDefStoreInitializer.java
66762001 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v1/AtlasEntityStoreV1.java
5bec16ed 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v1/AtlasTypeDefGraphStoreV1.java
1a04418a 
  repository/src/test/java/org/apache/atlas/repository/migration/MigrationServiceTest.java
PRE-CREATION 
  repository/src/test/java/org/apache/atlas/repository/migration/RelationshipMappingTest.java
PRE-CREATION 
  repository/src/test/java/org/apache/atlas/repository/store/graph/v1/AtlasEntityStoreV1Test.java
8257faa1 
  repository/src/test/java/org/apache/atlas/repository/store/graph/v1/AtlasRelationshipStoreV1Test.java
ac35860d 
  repository/src/test/resources/stocks-2-0.8-extended-tag.json PRE-CREATION 
  repository/src/test/resources/stocks-2.zip PRE-CREATION 
  repository/src/test/resources/stocks-2/atlas-export-info.json PRE-CREATION 
  repository/src/test/resources/stocks-2/atlas-export-order.json PRE-CREATION 
  repository/src/test/resources/stocks-2/atlas-typesdef.json PRE-CREATION 
  webapp/src/main/java/org/apache/atlas/web/filters/ActiveServerFilter.java 6681a372 
  webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java 1b3f2c86 
  webapp/src/main/java/org/apache/atlas/web/security/AtlasSecurityConfig.java f1760e7f 
  webapp/src/main/java/org/apache/atlas/web/service/ServiceState.java 3fe8d18c 
  webapp/src/test/java/org/apache/atlas/web/resources/AdminResourceTest.java aab2bb8f 


Diff: https://reviews.apache.org/r/66253/diff/12/


Testing (updated)
-------

**Unit tests**
Unit tests for _AtlasGraphSONReader_ added.

**Functional tests**
Steps to test file-based import:
- Place the exported file say _/root/atlas-data_
- Add to _Atlas_ Ambari's customer property:
    _atlas.migration.mode.filename=/root/atlas-data_
- Ambari will prompt for a restart. Restart Atlas.
- On the server view the progress in the logs using: _tail -f /var/log/atlas/application.log_
- Use the CURL call mentioned above and view the status and the progress of the import.

Steps to test directory-based import:
- Place the exported files say _/root/atlas-data_
- Add to _Atlas_ Ambari's customer property:
    _atlas.migration.mode.filename=/root/atlas-data_
- Ambari will prompt for a restart. Restart Atlas.
- On the server view the progress in the logs using: _tail -f /var/log/atlas/application.log_
- Use the CURL call mentioned above and view the status and the progress of the import.

**Performance Tests**
Single threaded average commits: 10K per minute (~ 600K per hour).

Configuration:
  - Number of worker threads: 8
  - Batch Size: 3000
  - Number of nodes in cluster: 5, (Solr & Atlas servers on same node)
Total duration: ~11 hrs (2018-04-03 16:20:52 to 2018-04-04 03:03:12)
Vertex processing: ~ 6 hrs (2018-04-03 16:20:57 to 2018-04-03 22:30:39)
  - Total vertices: 8.7 million (8700227)
  - Rate: ~ 1.45 million per hour
Edge processing: ~2 hrs (2018-04-03 22:30:40 to 2018-04-04 00:17:54)
  - Total edges: 17 million (17149967)
  - Rate: 8.5 million per hour
Post processing:  ~ 3 hrs (2018-04-04 00:18:16 to 2018-04-04 03:03:12)
  - Total vertices: 8.7 million.
  - Rate: ~3 million per hour


Thanks,

Ashutosh Mestry


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message