atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apoorv Naik (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
Date Fri, 10 Aug 2018 04:13:00 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575740#comment-16575740
] 

Apoorv Naik edited comment on ATLAS-2816 at 8/10/18 4:12 AM:
-------------------------------------------------------------

One suggestion, use the followReferences flag instead of hardcoding the ignoreRelationship
param. This would make is easier to toggle if certain deployment scenario wants to use the
relationship details to be captured in the entityText. Also follow this guideline for patch
creation,

 
 # Work on a local branch
 # Commit the patch on local branch
 # Generate patch using "git format-patch origin/master" (this way you get credit by including
author info in the patch)
 # Attach the patch to JIRA

 

HTH


was (Author: apoorvnaik):
One suggestion, use the followReferences flag instead of hardcoding the ignoreRelationship
param. This would make is easier to toggle if certain deployment scenario wants to use the
relationship details to be captured in the entityText.

 

HTH

> Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
> ------------------------------------------------------------------------
>
>                 Key: ATLAS-2816
>                 URL: https://issues.apache.org/jira/browse/ATLAS-2816
>             Project: Atlas
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Chengbing Liu
>            Assignee: Apoorv Naik
>            Priority: Major
>         Attachments: ATLAS-2816.01.patch
>
>
> We encountered a problem when using Hive bridge in production. One database has 5000+
tables. Importing the first table costs only tens of milliseconds, and then it becomes slower
with more tables. In the end, it costs 1~2 seconds to import one table.
> After investigation, we realized that it is not necessary for the {{FullTextMapperV2}}
to retrieve all the relationship of the database each time a table is imported. The time complexity
of importing a whole database actually goes to O(n^2) (n is number of tables).
> We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: {{ignoreRelationship}}.
When set to true, {{mapVertexToAtlasEntity}} will skip the {{mapRelationshipAttributes}} call.
Since {{FullTextMapperV2}} will not use relationship attributes of the entity, this can save
plenty of time when importing entities with a large number of relations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message