atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Radley <david_rad...@uk.ibm.com>
Subject Tag propagation and classification entityTypes
Date Fri, 15 Sep 2017 08:25:43 GMT
Hi Madhan and Sarath,
It occurs to me that we are introducing 2 new definitions around 
classifications that require the code to traverse around the graph.
- classificationDefs now have entityTypes to restrict the entities that 
they can be applied to. This requires us to check entity and 
classification hierarchies to ensure that inherited entities and 
classifications abide by these restrictions.
This is currently done in code in the AtlasClassificationType. One set of 
checks at classification add / update time and another when we try to add 
a classification to an entity. 
- tag propagation implementation is currently in review and looks to work 
out where tags should be propagated to using Gremlin TP2 queries. The 
current proposed query is neat around 10 lines long, but does not account 
for inheritance or entityType restrictions. 

If we carry on with the current approach , we potentially need to 
implement checking down the graph in the type code and also in the Gremlin 
query. I wonder if we can have a consistent approach so we use gremlin 
queries in both scenarios or use code in both scenarios. I see a few 
options

1) Carry on as is , code for Classification entityTypes , TP2 query for 
tag propagation. The TP2 query may become much more complex as it will 
need to recurse around the classification types in the graph and the 
entity types in the graph as well as the instance graph. The entityTypes 
gremlin logic will need to match the entityTypes checking code logic. 
2) Move all the logic to code, this should mean we work at TP3, may give 
us more flexibility to handle tag propagation overrides we will need at a 
later date
3) Move all navigation logic to gremlin queries, this is appealing as the 
graph engine then can optimize the queries. 
4) Extend 3) to store (cache) some of the inherited states in the instance 
graph so a simpler query can be made. We could also extend this approach 
to store when a user overrides the default propagation. I know we have 
concerns with duplicating metadata. I wonder if we could split the 
properties in the vertices so there is a defined section and a derived / 
cached section, so it is obvious which properties might need 
re-calculating. 

Thoughts?
   all the best, David. 
 
 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message