atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Mestry (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ATLAS-1665) Export API: Improve Generated ZIP File Using AtlasEntityWithExtInfo
Date Fri, 17 Mar 2017 18:45:42 GMT

     [ https://issues.apache.org/jira/browse/ATLAS-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Mestry updated ATLAS-1665:
-----------------------------------
    Attachment: ATLAS-1665.patch

> Export API: Improve Generated ZIP File Using AtlasEntityWithExtInfo
> -------------------------------------------------------------------
>
>                 Key: ATLAS-1665
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1665
>             Project: Atlas
>          Issue Type: Improvement
>          Components:  atlas-core
>    Affects Versions: 0.9-incubating
>            Reporter: Ashutosh Mestry
>            Assignee: Ashutosh Mestry
>             Fix For: trunk
>
>         Attachments: ATLAS-1665.patch
>
>
> h5.Background
> Existing implementation of Export API w.r.t ZIP file generation adds 1 *.json* file per
entity. This makes ZIP file creation inefficient. The ZIP files are 75% larger in size than
what could be possible with fewer *.json* file entries.
> h5.Solution
> The implementation uses the new v2 API *AtlasEntityWithExtInfo* representation instead
of *AtlasEntity*. This format combines an entity with related entities as one. E.g. *hive_table*
will contain all the *hive_columns* that it is made up of. (See example section below.)
> This results in significant reduction of generated *JSON* files. This impacts reduction
in generated *ZIP* file.
> h5.Implementation Details
> *Export API*
> - Modified *Gremlin* used to fetch connected entities to return *guid* with *boolean*
to indicate if the entity is process or not.
> - _ExportService_ Modified implementation to fetch *AtlasEntityWithExtInfo* instead of
*AtlasEntity*. Modified book keeping to save *process* (lineage) entities after all non-process
entities are saved.
> - _ZipSink_ Minor modification to serialize  *AtlasEntityWithExtInfo*.
> *Import API*
> - _ZipSource_ Modified to source *AtlasEntityWithExtInfo*.
> - _EntityImportStream_ Modified to source *AtlasEntityWithExtInfo*.
> - _AtlasEntityStreamForImport.getGuid_ Modified  to source requested entities first from
stored *AtlasEntityWithExtInfo* object. Request from stream only if not found.
> - _AtlasEntityStoreV1.bulkImport_ Minor modification to use the new changes to stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message