atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Radley <david_rad...@uk.ibm.com>
Subject Re: Tag propagation
Date Mon, 15 Jan 2018 10:05:31 GMT
Hi Mandy,
From what I recall, we discussed some scenarios that we felt Tag 
propagation would be useful. I think the use cases we are thinking of are 
now indicated by the model files that have "propagateTags" set. The 
examples include the semanticClassification and the 
"hbase_table_column_families" relationships. We had not identified any use 
cases we felt were important where BOTH would be useful for a 
relationship; so were thinking of removing that option. Do you have some 
relationships that require BOTH in the open types - it would be useful for 
me to understand why those relationships need BOTH, 
         many thanks , David. 


From:   Mandy Chessell/UK/IBM
To:     dev@atlas.apache.org
Cc:     David Radley <david_radley@uk.ibm.com>, atlas 
<dev@atlas.incubator.apache.org>, Sarath Subramanian <sarath@apache.org>
Date:   14/01/2018 13:25
Subject:        Re: Tag propagation


Hello Madhan, David,
I would not wish to remove the option to have tag propagation flow in both 
directions.  Most metadata relationships are not hierarchical.  They are 
two-way and different situations will cause for different classifications 
to flow in each direction.  I do not remember the discussion on removing 
the BOTH open - but if I missed it I apologise.  What is the 
justification?

The enforcement of the classification's entity types should not prevent 
the propagation of the tag through an entity because it does not support a 
tag.  Down stream entities may support the tag and need it to be 
propagated to them.  We need to work through more scenarios because we 
also need a way to bound tag propagation :)

As an FYI, the OMRS API for classifications includes an origin attribute 
that lets us return classifications with an entity that are explicitly 
assigned or propagated to the entity.  Most callers will not care but some 
might.

All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of 
Sheffield

Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49

Assistant: Janet Brooks - jsbrooks12@uk.ibm.com




From:   Madhan Neethiraj <madhan@apache.org>
To:     David Radley <david_radley@uk.ibm.com>, Sarath Subramanian 
<sarath@apache.org>
Cc:     atlas <dev@atlas.incubator.apache.org>
Date:   13/01/2018 02:14
Subject:        Re: Tag propagation



David,

 

Sarath was working on tag-propagation, but had to take up tasks related to 
JanusGraph and others. He will be resuming tag-propagation work next week; 
this feature would be part of Atlas-1.0.0 release.

 

- lose BOTH - this is still in the code - I think we agreed we wanted to 
get rid of this. 
Agree.

 

- should honour the classification entitytypes - so that we do not get 
classifications applied to inappropriate entityTypes 
Perhaps we should stop the propagation at the entity where the 
classification is not applicable? I think it wouldn’t be correct to block 
a classification association to an entity if the classification is not 
applicable for a down-stream entity.

 

- There is the question about how the propagated classifications would 
look in the get entity rest API  - I suggest that they appear in the 
entities classification with a field indicating that they are derived (and 
hence not able to be removed by an entity update). 
I was thinking about a separate attribute, 
AtlasEntity.propagatedClassifications, for this. However, I think your 
suggestion of adding a field to AtlasClassification is a better one; with 
this approach no changes would be needed in applications that process 
classifications on an entity. How about we capture the guid of the source 
entity on which the classification is associated, 
AtlasClassification.sourceEntityGuid? If this value is null, then the 
classification is associated with the current entity directly.

 

- I would hope that Ranger would pick up these new propagated tags using 
the existing tag sync. 
Yes. With the approach detailed above, no changes would be needed in 
Ranger.

 

- I think you wanted the derived classifications to be picked up at query 
time. I also remember suggesting that we store the derived classifications 
in a derivedClassifiation property in the entity which would contain the 
list of derived classifications. Or we could store them as a new type of 
edge "propagated classification" edges to the real classification. I like 
the edge idea. 
To  enable queries like ‘get list of entities that are classified as PII’, 
it will be performant if each entity vertex has data about the propagated 
classifications as well, similar to entities having data on 
classifications directly associated with the entity currently. However, 
all the entities should directly reference a single instance of a 
classification, so that it will be easier to manage changes to 
classification attribute values. Sarath will send an update on the design 
choices later next week.

 

If we had the above, we could classify a Term as PSI, and use the semantic 
mapping to propagate the classifications to the hive column. The hive 
column would not pick up classifications defined in the area 3 model like 
"SpineObject", which is defined as only applying to "GlossaryTerm". 
Yes. This usecase should be covered by the design discussed above.

 

Thanks,

Madhan

 

From: David Radley <david_radley@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mneethiraj@hortonworks.com>
Cc: atlas <dev@atlas.incubator.apache.org>
Subject: Tag propagation

 

Hi Madhan, 
I have a look in the code - I was surprised that the tag propagation was 
not in. Is this something you are looking at in the near future? If not I 
may need to look into it. I suggest the tag propagation implementation 
should phase 1 should: 
- lose BOTH - this is still in the code - I think we agreed we wanted to 
get rid of this. 
- should honour the classification entitytypes - so that we do not get 
classifications applied to inappropriate entityTypes 
- There is the question about how the propagated classifications would 
look in the get entity rest API  - I suggest that they appear in the 
entities classification with a field indicating that they are derived (and 
hence not able to be removed by an entity update). 
- I would hope that Ranger would pick up these new propagated tags using 
the existing tag sync. 
- I think you wanted the derived classifications to be picked up at query 
time. I also remember suggesting that we store the derived classifications 
in a derivedClassifiation property in the entity which would contain the 
list of derived classifications. Or we could store them as a new type of 
edge "propagated classification" edges to the real classification. I like 
the edge idea. 

If we had the above, we could classify a Term as PSI, and use the semantic 
mapping to propagate the classifications to the hive column. The hive 
column would not pick up classifications defined in the area 3 model like 
"SpineObject", which is defined as only applying to "GlossaryTerm". 

What do you think?   all the best, David. 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message