atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suma Shivaprasad (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ATLAS-535) Support delete cascade efficently
Date Fri, 26 Feb 2016 00:08:18 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168154#comment-15168154
] 

Suma Shivaprasad edited comment on ATLAS-535 at 2/26/16 12:07 AM:
------------------------------------------------------------------

*Modelling  DELETE cascades across entities*


*Background*

Currently, the Typesystem allows modelling relationship behaviour between types as part of
its attribute flags. The isComposite flag on an attribute defines that the relation between
the current type and the attribute Type (which is annotated with the isComposite flag) have
a “composition” relationship indicating that the referred instance needs to be loaded,
deleted whenever the current instance is loaded/deleted. For eg: hive_table.columns has an
isComposite relationship and whenever a table is loade/deletd , the columns are also loaded/deleted.

*API changes*

deleteEntity API should have another flag to indicate cascading deletes


*Modelling/Repository changes*

*Option 1: *

Add an attribute array<hive_table> in hive_db 

*Pros:*

Works OOB and does not need any code changes
Since the entity being deleted is also the source from which the delete cascade begins i.e
it is the parent entity, we know exactly which edges i.e the ones with label __typeName.attributeName
and vertices are to be deleted. 

*Cons:*

The current support for adding such an attribute flag is limited in its application in some
cases. For eg: Database->Table , Table -> Partitions could have issues since any add
of a partition will require updating the Table entity and add it to array<partition>s
which could possibly have issues with scale . If we take hourly, daily partitions as worst
case over five years, it could have around ~50000 - partition entries for a table. Not sure
what can be an average number of tables that we should support for a Database  ?
Will have to implement another flag isVisible/lazyFetch on an attribute to not load/display
the tables or do a lazy fetch when a database is loaded  since this is more of an atribute
added for internal reasons and should not be displayed when a database is viewed. If we add
a lazyFetch, should we load all the entries in the array ?

*Option 2:*

Add an attribute flag called isInverseComposite on hive_table.db.

In this case, 

whenever an instance of hive_db needs to be deleted, it needs to look at all the incoming
vertices with edge label starting with __hive_db, look at their type definition and check
if isInverseComposite flag is set on them for the current type attribute. If set, then remove
the corresponding vertices and edges 

Get or update behaviour does not change/affected based on this flag. 

*Pros:*

Simple approach and doesnt need intrusive code changes

*Cons:*

An additional flag that users need to define in the type definition. 
Need to iterate over all the edges( which could be potentially large and check which ones
have the labels starting with that typeName prefix).  However, on an average there could be
mostly one or maximum two such attributes which have a potentially large number of edges and
hence the scan would anyways mostly go through all the vertices that need to be deleted.

*Option 3:*

There is no way currently to model associations between any two types/classes. The proposal
is to model this in a generic way as to be able to represent various association rules between
types which are not attribute specific . For eg: Database to Table is a composition relationship.

Define a generic new internal type 

*AssociationRule  *

attributes:
 String targetType   // the type which which the association rule is being defined
 String name   // the name of this Rule

Note: Typesystem will enforce a typecheck on the targetType using existing types. 


A type definition will have a Collection<AssociationRule> along with the existing attribute
definitions, traits etc


*CascadeRule extends AssociationRule*

*DeleteCascadeRule extends CascadeRule *

Currently the only Cascade type supported is DELETE

However going forward it could be extended later to varous other types like  the JPA cascade
types - for updates, gets etc -  https://docs.oracle.com/javaee/6/api/javax/persistence/CascadeType.html

Also going forward AssociationRule(s) could be attached at an attribute level i.e isComposite
on an attribute can be changed to be a DeleteCascade rule instead. So the same set of association
rules can apply at both the type , attribute levels.

When a delete with cascade is issued on an entity, if its corresponding type contains a DeleteCascadeRule,
delete any references from this entity which are of the targetType for eg: when an entity
of hive_db is deleted, it will delete all the hive_table  entities associated with it. In
order to find the vertices to delete, it will follow all edges starting with the typeName
__hive_table(targetType) and delete the referred vertices. This should work for all the complex
and collection types -  array, map, struct and class references. 

*Pros:*

Generic and can be used to define any associations between two types and use them in any aspect
of ATLAS eg: during entity mutation - updates, gets, delete behaviour etc.
the current hive model of Table-> Database reference will not need a change which means
that there are no extra updates whenever a table is added which was the case in Option 1.

*Cons:*

Is more intrusive and will need changes in type system apart from entity mutation. 
Need to iterate over all the edges( which could be potentially large and check which ones
have the labels starting with that typeName prefix).  However, on an average there could be
mostly one or maximum two such attributes which have a potentially large number of edges and
hence the scan would anyways mostly go through all the vertices that need to be deleted. Also
deletes in general could be a less used operation than creates/updates.


Due to its simplicity and non-intrusive code changes, leaning towards Option 2. Thoughts?



was (Author: suma.shivaprasad):
*Modelling  DELETE cascades across entities*


Background

Currently, the Typesystem allows modelling relationship behaviour between types as part of
its attribute flags. The isComposite flag on an attribute defines that the relation between
the current type and the attribute Type (which is annotated with the isComposite flag) have
a “composition” relationship indicating that the referred instance needs to be loaded,
deleted whenever the current instance is loaded/deleted. For eg: hive_table.columns has an
isComposite relationship and whenever a table is loade/deletd , the columns are also loaded/deleted.

API changes

deleteEntity API should have another flag to indicate cascading deletes


Modelling/Repository changes

Option 1: 

Add an attribute array<hive_table> in hive_db 

Pros:

Works OOB and does not need any code changes
Since the entity being deleted is also the source from which the delete cascade begins i.e
it is the parent entity, we know exactly which edges i.e the ones with label __typeName.attributeName
and vertices are to be deleted. 

Cons:

The current support for adding such an attribute flag is limited in its application in some
cases. For eg: Database->Table , Table -> Partitions could have issues since any add
of a partition will require updating the Table entity and add it to array<partition>s
which could possibly have issues with scale . If we take hourly, daily partitions as worst
case over five years, it could have around ~50000 - partition entries for a table. Not sure
what can be an average number of tables that we should support for a Database  ?
Will have to implement another flag isVisible/lazyFetch on an attribute to not load/display
the tables or do a lazy fetch when a database is loaded  since this is more of an atribute
added for internal reasons and should not be displayed when a database is viewed. If we add
a lazyFetch, should we load all the entries in the array ?

Option 2:

Add an attribute flag called isInverseComposite on hive_table.db.


In this case, 

whenever an instance of hive_db needs to be deleted, it needs to look at all the incoming
vertices with edge label starting with __hive_db, look at their type definition and check
if isInverseComposite flag is set on them for the current type attribute. If set, then remove
the corresponding vertices and edges 


Get or update behaviour does not change/affected based on this flag. 


Pros:

Simple approach and doesnt need intrusive code changes

Cons:

An additional flag that users need to define in the type definition. 
Need to iterate over all the edges( which could be potentially large and check which ones
have the labels starting with that typeName prefix).  However, on an average there could be
mostly one or maximum two such attributes which have a potentially large number of edges and
hence the scan would anyways mostly go through all the vertices that need to be deleted.


Option 3:

There is no way currently to model associations between any two types/classes. The proposal
is to model this in a generic way as to be able to represent various association rules between
types which are not attribute specific . For eg: Database to Table is a composition relationship.

Define a generic new internal type 


AssociationRule  

attributes:
 String targetType   // the type which which the association rule is being defined
 String name   // the name of this Rule

Note: Typesystem will enforce a typecheck on the targetType using existing types. 


A type definition will have a Collection<AssociationRule> along with the existing attribute
definitions, traits etc


CascadeRule extends AssociationRule


DeleteCascadeRule extends CascadeRule

Currently the only Cascade type supported is DELETE

However going forward it could be extended later to varous other types like  the JPA cascade
types - for updates, gets etc -  https://docs.oracle.com/javaee/6/api/javax/persistence/CascadeType.html

Also going forward AssociationRule(s) could be attached at an attribute level i.e isComposite
on an attribute can be changed to be a DeleteCascade rule instead. So the same set of association
rules can apply at both the type , attribute levels.

When a delete with cascade is issued on an entity, if its corresponding type contains a DeleteCascadeRule,
delete any references from this entity which are of the targetType for eg: when an entity
of hive_db is deleted, it will delete all the hive_table  entities associated with it. In
order to find the vertices to delete, it will follow all edges starting with the typeName
__hive_table(targetType) and delete the referred vertices. This should work for all the complex
and collection types -  array, map, struct and class references. 


Pros:

Generic and can be used to define any associations between two types and use them in any aspect
of ATLAS eg: during entity mutation - updates, gets, delete behaviour etc.
the current hive model of Table-> Database reference will not need a change which means
that there are no extra updates whenever a table is added which was the case in Option 1.

Cons:

Is more intrusive and will need changes in type system apart from entity mutation. 
Need to iterate over all the edges( which could be potentially large and check which ones
have the labels starting with that typeName prefix).  However, on an average there could be
mostly one or maximum two such attributes which have a potentially large number of edges and
hence the scan would anyways mostly go through all the vertices that need to be deleted. Also
deletes in general could be a less used operation than creates/updates.


Due to its simplicity and non-intrusive code changes, leaning towards Option 2. Thoughts?


> Support delete cascade efficently
> ---------------------------------
>
>                 Key: ATLAS-535
>                 URL: https://issues.apache.org/jira/browse/ATLAS-535
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Suma Shivaprasad
>             Fix For: 0.7-incubating
>
>
> Currently there are some limitation in the typesystem and modelling to support delete
cascades at scale through the isComposite flag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message