atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Radley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ATLAS-1161) Tags should be bound to an object's name and remain bound to all incarnations of that name
Date Tue, 13 Sep 2016 09:19:20 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15486749#comment-15486749
] 

David Radley commented on ATLAS-1161:
-------------------------------------

Interesting - thanks [~dmarkwat] 
- it seems that the asset in question is like a transient file. I assume that the sort of
use case you are thinking of is a delta ETL / map reduce jobs, where you accumulate daily
updates in a table. You may be deleting and recreating files as part of this process. A way
of doing this would be to use the time stamp in the file name or the folder name / namespace.
It seems to me that a more complete solution would be use rules based classification / tagging
- e.g. everything in this folder or with this namespace regex could be automatically tagged
/ classified; this could catch renames as well. It might that where a file lives brings picks
up its classification - so one way of changing the classification of an asset would be to
move it to a new location. 
- I guess your suggested solution is a useful improvement; though I think the company setting
this up needs to buy into this sort of behaviour with a config option or new API or new atlas
type;  so that there i no unintended consequences for other use cases. Usual practice would
be to not change defaults , but I guess in an incubator if we feel this default is more useful
then this could be the default behavior.     
- Another way of doing this classification would be that the ETL / map reduce job does the
classification of the target table. It could then also do more granular column based tagging
as well. The responsibility of the classification then lies with the job that creates the
file.     
- I suspect we would often not tag a table as PII - more likely a column. Though we might
tag a table as "customer data" or "for testing"  or the like. 

> Tags should be bound to an object's name and remain bound to all incarnations of that
name
> ------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-1161
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1161
>             Project: Atlas
>          Issue Type: Improvement
>    Affects Versions: trunk, 0.7-incubating
>            Reporter: Dan Markwat
>
> As a user I would like tags I ascribe to an object in Atlas carry to the next incarnation
of that object.  In effect, tags would be ascribed to a fully-qualified object name and all
incarnations of that name would have the tags apply to it.  (Not unlike Ranger and the way
it applies policies to objects).
> Example: I create a Hive table, TableA.  I tag TableA with tags, Tag1 and Tag2.  I drop
TableA.
> In the current Atlas world, if I create TableA again, Tag1 and Tag2 need to be re-applied
to TableA.  In the ideal governance/security world, if I re-create TableA I should not have
to re-tag it with Tag1 and Tag2; instead, I should be required to *untag* TableA if I desire
TableA to be clean and untagged.  This effectively functions like a light switch: user turns
on light, just because the bulb is swapped out doesn't mean the switch turned off - the user
must explicitly turn the switch off, just as they did to turn it on.  Think also about Ranger:
just because I deleted an object doesn't mean that policy goes away.
> By effectively deleting the binding of Tag1 and Tag2 to the name TableA whenever TableA
is deleted, Atlas ceases to be a book of record for tags associated with TableA, as those
tags would need to be applied again.  This is bad in a world where creating/dropping objects
and tagging objects are part of 2 independent and asynchronous processes - one carried out
by an engineer, the other carried out by a governance/security administrator.  The issue is
compounded by the fact that tags can have security policies associated with them in Ranger;
and any object missing its tag at re-creation of that object now is missing security policies
previously attached to it.
> This is an especially annoying issue for organizations that have large ingestion pipelines
where tables are sometimes deleted or modified in ways not easily accomplished through updating
table metadata.  Not to mention, (probably a new feature: ) easily-accessible records of what
was tagged with what - even if the object has been dropped or deleted - is especially important
for organizations that require auditing or have security controls based on tag-based policies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message