atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Graham Wallis (JIRA)" <>
Subject [jira] [Commented] (ATLAS-1821) Classification propagation from entity to a derivative or child entity
Date Mon, 05 Jun 2017 14:34:05 GMT


Graham Wallis commented on ATLAS-1821:

Can I suggest that there are two parts to what's needed: a flexible classification model and
a system for controlled derivation. 
The first part - the flexible model - should be a multi-dimensional classification scheme,
with an arbitrary number of dimensions, although most users would probably not need more than
about half a dozen dimensions. Dimensions are orthogonal. Each dimension contains values,
which can be either categorical or continuous (ordered). An example of a categorical dimension
might be: 'region' with values 'north-america', 'south-america', etc. The values are mutually
exclusive and there is no implied order or precedence of values. A categorical dimension can
be used to support access control decisions based on rules or policy. An example of a continuous
dimension might be a 'document-classification' dimension with values 'public', 'internal-use',
'confidential', 'secret', etc. On an continuous dimension there is an implied order, which
would be used to support precedence rules. Each data item can be thought of as occupying some
point in the multi-dimensional space.
The second part - the derivation system - needs to be tied to the relationships between data
items and provide a process model for how the dimensions of a source item are transformed
into, or affect, the dimensions of a target item. A pair of data items could be related in
one of a number of ways: one item may be derived from the other as a literal copy (duplicate),
or it could be that one item is a transformed (e.g. redacted) copy of the other. It is also
possible that one item is an aggregation of other items, or a filtered selection of other
a collection of items. Depending on the relationship, it would be possible to either propagate
the same classification dimensions and values from source to target, or to 'downgrade' or
'escalate' a continuous dimension due to the redaction or aggregation of information between
source and target. A dimension could be omitted or a dimension and value could be introduced,
as a result of a transformation relationship. The modification of classification dimensions
and values would need to be tightly controlled as part of the definition of the relationship.
For example, a data owner would need to inspect a redacting transformation and specify/certify
the process by which classification settings are derived for the redacted item.  
I think it's essential that the second part (propagations and transformations) is closely
(auditably) tied to the first part.

> Classification propagation from entity to a derivative or child entity
> ----------------------------------------------------------------------
>                 Key: ATLAS-1821
>                 URL:
>             Project: Atlas
>          Issue Type: Improvement
>          Components:  atlas-core, atlas-webui
>            Reporter: Srikanth Venkat
>             Fix For: 0.9-incubating
> User Story:
> As a data steward, I need a scalable way to quickly and efficiently propagate classification
across the information supply chain to support efficient searches and classification based
security for compliance and audit purposes. 
> This requires:
> 1. Classifications for derivative entities should be inherited from the originator and
to child entities from parent. 
> For example, if a Hive column is classified "Confidential" then resulting column created
from a CTAS operation should also be tagged "Confidential" to maintain the classification
of the original entity. In the case where 2 or more entities are composed, the derivative
entity should have the union of all classifications of each source entity.
> 2. Business Terms:
> a. Child business terms should inherit the classifications associated with the parent
> b. The option to propagate classification to child business terms in a hierarchy should
be provided
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a term should propagate to data assets that are already attached to that business
term as well
> 3. Data assets
> a. For all supported data asset types in Atlas, if a derivative asset is created it should
inherit the tags and attributes from the original asset.
> b. the option to propagate tags to child entities should be provided (e.g. if you tag
a folder in HDFS optionally tag all the files within it)
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a parent object should be inherited after child creation dynamically (unless
a flag is set not to do this)
> e. Derived data assets should have the tags of the original data asset.
> Conflict resolution - if there are different values for attributes on tags (classifications)
on upstream or parent entities used to derive a data asset then user needs to be prompted
for action to resolve the conflict. Once resolved, the resolved value should be carried forth
to derived assets.

This message was sent by Atlassian JIRA

View raw message