atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mandy Chessell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ATLAS-1690) Introduce top level relationships
Date Fri, 21 Apr 2017 10:31:04 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978467#comment-15978467
] 

Mandy Chessell commented on ATLAS-1690:
---------------------------------------

Hello David,
Comments follow based on V1.4

Pg2 "Metadata repositories store metadata. The context of a metadata object is dictated by
its relationships."  I am not sure these sentences tell the complete story.  Maybe something
like "The Apache Atlas metadata repository stores metadata objects and their relationships.
 The relationships between the metadata objects are as important as the metadata objects themselves.
 They explain how the data landscape is structured and how the components within it relate
to the business and the governance requirements, ownership and other interested parties. 
 The relationships in Apache Atlas today provide support for containment (or part-of) relationships.
 This is necessary to describe sub-components of a component - for example, a Hive Column
is a sub-component of a Hive Table.  With these types of relationships, the lifetime of the
sub-components is tied to their parent component.  So for example, if a hive table is deleted,
then all of its columns should also be deleted.  This design is looking to add support for
a new type of relationship between metadata objects that have independent lifetimes.  In fact
the creation of these relationships are actually an auditable action that can impact how data
is discovered, understood, secured, managed and removed.  Such relationships include when
Glossary ..."

Pg2  "If these links are made incorrectly (purposely or otherwise) data can be inappropriately
exposed." This comment is out of place - it is only true if the relationship is involved in
access control.  A more general comment could be "If these links are made incorrectly (purposely
or otherwise) data may be inappropriately used or governed."

Font of JSON example on page 4 is inconsistent - harder to read than necessary.

pg5 - "Relationship constraints" - first time mentioned this term - should be introduced in
examples above.

pg5 - "This name will help us name an association and its associated
classification."  Not sure what classification means in this sentence.  Also need a description
of why an association needs a name (I am thinking of this as a Type name - is that right?
  The name is important because the creation of these types of relationships are a deliberate
act of governance and we need to be able to describe their use - and govern their lifecycle.

pg 5 "“Address” and “Person”; a person has addresses, and addresses have people living
in them. In this case, there is no obvious direction, so a bidirectional relationship is natural
way of associating these concepts; the alternative would be 2 directional relationships that
would not be kept in sync."  Please use a metadata description - this is confusing to talk
about data relationships.

pg 5 "There are 2 main styles of relationships, tight and loose relationships."  Why have
new names been for these when at the top the doc states it is using UML names?  Also the names
are misleading.  There is nothing loose about the association between a glossary term and
a database column.  

pg 6 "In the case of tight relationships, the top entity and its children are governed as
one, as the lifecycles of the children are tied to the parent. "  It is true that the lifecycles
are linked but it does not mean the governance is tied - for example, the confidentiality
classification of a table may be different from the different columns it is made up of.  Governance
rules may be defined to act on specific columns and not on a table as a whole.

pg 6/7 - RelationshipDef example - please use metadata examples not data examples - it is
confusing because you would never define types for customer and account in Atlas.

pg7 "The entity instances use Atlas object ids pointing to the relationship instance (which
has a guid)."  This needs further explanation and an example.

pg 8 "Read" - what are the parameters on read - is this a single relationship operation?

pg 8 "Aggregation implies that here is containment "  I know what you mean but aggregation
and containment are different things in UML and so this statement is not logically correct.

p8 "A natural way to specify aggregation would be to have an isContainer Boolean flag, defaulting
to false and specified on one of the endpoints in the relationship."  Should say this flag
can only be set on one end.  

pg8 - aggregations example - please use metadata example such as category to term

pg8 - observations - a relationship described by a relationshipDef can not be manatory.  The
isOptional flag is obsolete.  Can we remove it?  Where a governance action needs two entities
to be linked to be functional then this needs to be handled by state attributes that it can
test.







> Introduce top level relationships
> ---------------------------------
>
>                 Key: ATLAS-1690
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1690
>             Project: Atlas
>          Issue Type: Improvement
>            Reporter: David Radley
>            Assignee: David Radley
>              Labels: VirtualDataConnector
>         Attachments: Atlas Relationships proposal v1.0.pdf, Atlas Relationships proposal
v1.1.pdf, Atlas Relationships proposal v1.2.pdf, Atlas Relationships proposal v1.3.pdf, Atlas
Relationships proposal v1.4.pdf
>
>
> Introduce top level relationships including support for 
> -many to many relationships
> - relationship names including the name for both ends and the relationship.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message