atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nigel Jones (JIRA)" <>
Subject [jira] [Commented] (ATLAS-1821) Classification propagation from entity to a derivative or child entity
Date Tue, 06 Jun 2017 09:41:18 GMT


Nigel Jones commented on ATLAS-1821:

 first part -- makes sense, in addition we need a cardinality, so for example you can't assign
your example of document-classification to the same entity more than once (at least directly).
In fact I don't think any classification should be able to be assigned more than once? region
for example, though there's probably examples where it would make sense - what about "applicable"
regions where we have a list of areas where something (like a regulation, or usage) applies?
Perhaps that should be a case of allowing single or multi valued/list. Makes no sense for
a continuous dimension, but could for a category?

 second part - are you referring to data processing here? I think we do need to model that
as that we can understand the classification of derived data. In some cases we may be able
to understand how to promote/demote the classification automatically, in others we won't,
though we may be able to provide hints of a valid range which could then be confirmed through
a dev or stewardship process? We also though have derivation that applies between terms and
entities which I think was the original discussion around propogation (such as salary(asset:column)
-> salary [term] -> spi [classification] as a simple case) ... but also applies to structural
relationships like salary (asset:column) -> salaryinfo (asset:database) -> spi [classification)
.. where knowing that there's a containment relationship between the column and the database
is what we use to use the right classification for column. In that example if there was ALSO
as classification on the column itself that could take precendence, even if weaker (some tools
could  determine all possible derived classifications and offer a stewardship process to help
a customer verify/check anomalies)

Going back to your second statement again, was your intent that the description applies to
this case too? Actually I think it was ;-) I was thinking process as "data movement" or similar,
but I think your general concept still applies 

> Classification propagation from entity to a derivative or child entity
> ----------------------------------------------------------------------
>                 Key: ATLAS-1821
>                 URL:
>             Project: Atlas
>          Issue Type: Improvement
>          Components:  atlas-core, atlas-webui
>            Reporter: Srikanth Venkat
>             Fix For: 0.9-incubating
> User Story:
> As a data steward, I need a scalable way to quickly and efficiently propagate classification
across the information supply chain to support efficient searches and classification based
security for compliance and audit purposes. 
> This requires:
> 1. Classifications for derivative entities should be inherited from the originator and
to child entities from parent. 
> For example, if a Hive column is classified "Confidential" then resulting column created
from a CTAS operation should also be tagged "Confidential" to maintain the classification
of the original entity. In the case where 2 or more entities are composed, the derivative
entity should have the union of all classifications of each source entity.
> 2. Business Terms:
> a. Child business terms should inherit the classifications associated with the parent
> b. The option to propagate classification to child business terms in a hierarchy should
be provided
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a term should propagate to data assets that are already attached to that business
term as well
> 3. Data assets
> a. For all supported data asset types in Atlas, if a derivative asset is created it should
inherit the tags and attributes from the original asset.
> b. the option to propagate tags to child entities should be provided (e.g. if you tag
a folder in HDFS optionally tag all the files within it)
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a parent object should be inherited after child creation dynamically (unless
a flag is set not to do this)
> e. Derived data assets should have the tags of the original data asset.
> Conflict resolution - if there are different values for attributes on tags (classifications)
on upstream or parent entities used to derive a data asset then user needs to be prompted
for action to resolve the conflict. Once resolved, the resolved value should be carried forth
to derived assets.

This message was sent by Atlassian JIRA

View raw message