atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suma Shivaprasad (JIRA)" <>
Subject [jira] [Commented] (ATLAS-751) Add support for primary key constraint on class types
Date Wed, 08 Jun 2016 20:37:21 GMT


Suma Shivaprasad commented on ATLAS-751:

Changes  completed

1. Type system changes to add primary key constraints on class types and serialization. Had
faced issues in type serialization during this which was fixed.
2. Entity deduplication based on primary key attributes which in turn support gremlin query
on primitive, class and an additional post processing check to handle array references
3. Model changes in UT models and HiveHook, Storm for adding primary key constraints
4. Remove unique key attributes across the model - Hive Hook, UTs, ITs in webapp, Storm, HDFS
5. Add APIs for get, delete, update by primary key
6. Made changes in Hive Hook to use the new APIs for update and delete.
7. Added support for evaluating a  primary key expression attribute - to maintain backward
incompatibility and return qualifiedName during entity searches and gets.  - This allows user
to specify the display format of the attribute and the value is evaluated at runtime using
the attributes of the instance mapped from the graph.
8. Added tests in GraphBackedDiscoveryTest to make sure reference searches work - like for
eg: etc with multiple 'and' clauses 

Pending changes

1. Debug and Resolve performance issue with the Gremlin query(dedup) during writes - Currently
have fixed it by running the query through TitanGraphQuery which returns a list of vertex
ids which are then passed as the set of start vertices to further Gremlin queries for class
reference searches . However there could be issues if there are a lot of vertex ids returned
through the first filters and  still need to test if its fine and whats the limit which would
cause it to break. Else need to debug and fix the issue with slow running queries.
2. Hive Hook ITs - are failing since column entity asserts are failing in a lot of cases(need
to debug further) and secondly there is a change needed in deduplication to handle class vertex
reference searches ( The code for this is in place but yet to be tested)
3. Falcon hook and Sqoop hook - merge changes after review and make changes in ITs
4. Storm hook - yet to test changes in model
5. Add some pending UTs in PrimaryKeyDedupHandler
6. Add ITs for new rest APIs added - for get , update, delete by primary key
7. Review and code review fixes 

> Add support for primary key constraint on class types
> -----------------------------------------------------
>                 Key: ATLAS-751
>                 URL:
>             Project: Atlas
>          Issue Type: Improvement
>    Affects Versions: 0.7-incubating
>            Reporter: Suma Shivaprasad
>            Assignee: Suma Shivaprasad
>             Fix For: 0.7-incubating
>         Attachments: ATLAS-751.1.patch, ATLAS-751.1.patch
> Persisting the qualified Name for an entity has multiple issues
> 1. In case of soft deletes,consider the following scenario
> a. Table A -> insert overwrite -> Table B
> If the table A  and B are dropped and recreated again and the insert overwrite query
is rerun, then it should create another lineage process since the tables are different here.
Similarly for CTAS etc. However the same process will get updated due to the current way of
storing qualifiedName 
> 2. Storing qualified names inherently leads to a lot of updates during operations renames
for eg: if a table is renames, all its columns.qualifiedName, sd.qualifiedName etc get updated.
If this is done with partitions, then the updates will definitely take a lot of time.

This message was sent by Atlassian JIRA

View raw message