atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apoorv Naik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
Date Fri, 10 Aug 2018 04:10:00 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575740#comment-16575740
] 

Apoorv Naik commented on ATLAS-2816:
------------------------------------

One suggestion, use the followReferences flag instead of hardcoding the ignoreRelationship
param. This would make is easier to toggle if certain deployment scenario wants to use the
relationship details to be captured in the entityText.

 

HTH

> Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
> ------------------------------------------------------------------------
>
>                 Key: ATLAS-2816
>                 URL: https://issues.apache.org/jira/browse/ATLAS-2816
>             Project: Atlas
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Chengbing Liu
>            Priority: Major
>         Attachments: ATLAS-2816.01.patch
>
>
> We encountered a problem when using Hive bridge in production. One database has 5000+
tables. Importing the first table costs only tens of milliseconds, and then it becomes slower
with more tables. In the end, it costs 1~2 seconds to import one table.
> After investigation, we realized that it is not necessary for the {{FullTextMapperV2}}
to retrieve all the relationship of the database each time a table is imported. The time complexity
of importing a whole database actually goes to O(n^2) (n is number of tables).
> We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: {{ignoreRelationship}}.
When set to true, {{mapVertexToAtlasEntity}} will skip the {{mapRelationshipAttributes}} call.
Since {{FullTextMapperV2}} will not use relationship attributes of the entity, this can save
plenty of time when importing entities with a large number of relations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message