atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mandy Chessell <mandy_chess...@uk.ibm.com>
Subject Re: Rename trait to classification
Date Mon, 26 Sep 2016 12:43:17 GMT
Hello Hemanth, David,
This is a great discussion.   These concepts are all related, in that they 
are linked to data descriptions (such as schemas) to characterise data. 
However, I think your probing is right, the governance classifications are 
slightly different from traits and glossary terms. 

Glossary terms are focused on the meaning of the data.  They follow the 
structure of the subject area, and link related terms together to show 
potential object, attributes, relationships that are typically found 
together.   Traits seem to offer an more informal means to characterise 
data.   These seem useful for characterising data specific for particular 
projects, or areas of special interest to the data lake team.

The governance classifications are a formal definition.  They are often 
defined as company-wide values that most employees are trained on.  So a 
deployment of Atlas in a new organization could well involve adding their 
existing classification schemes to the Atlas repository.   The values I 
shared in the earlier email are those we suggest for organizations that do 
not currently have any information governance. 

The values in each classification scheme are kept small (to keep them 
memorable) and then the governance program is built around them.  So, for 
example, each system has a set of rules for how it manages data for each 
of the classification values.   When new systems are brought in, new rules 
may be defined, but the employees still only have to know the standard 
classification schemes. 

As we continue to enhance the work of the governance enforcement, these 
classifications will be the key values encoded in the rules.  For a 
sophisticated organization with a company-wide data strategy, the 
classifications are often linked to the glossary terms and the glossary 
terms are linked to the data schemas.  This means the same classifications 
(and hence rules) are applied to the same type of data irrespective of the 
system it came from.  Alternatively, where system owners want to control 
how the data from their systems are classified, the governance 
classifications are linked directly to the schemas and so there may be 
variation in the way a certain type of data (eg credit card numbers) are 
governed.

In either case, the classifications need to be determined where data is 
accessed and so we need a fast look-up mechanism for these values.

All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
IBM Analytics Group CTO Office

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of 
Sheffield

Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49

Assistant: Janet Brooks - jsbrooks12@uk.ibm.com



From: 
To:     dev@atlas.incubator.apache.org
Date:   26/09/2016 08:09
Subject:        Re: Rename trait to classification



Hi David,

Reg. the point I made about sharing traits - I don't want to give an
impression that this as an agreed upon point. Apologize if I conveyed
that sense.

It is a fact that Atlas today has two concepts that are slightly
related: Traits (aka Tags) and Business Terms. The latter was new in
0.7. IMO, it is important that the Atlas community tries to converge
on an unambiguous definition of these concepts as the product would be
driven around these.

With respect to this thread, I am trying to fit in whether
"classification" is a new concept. Or it overlaps with one of the two
existing ones (which we are trying to rename).

I am certainly not a domain expert on this in any sense :-) - so
hoping that others who are would provide guidance (@aahn - ping?).

Thanks
hemanth

On Mon, Sep 26, 2016 at 2:59 PM, David Radley <david_radley@uk.ibm.com> 
wrote:
> Hi Hermanth and Mandy ,
> Thanks for your feedback.
>
> It does seem like these are de-facto industry terms in the governance
> industry; the reason I say this is that looking around the web I see 
quite
> a few uses of the words governance classification in different domains
> (including in the Atlas documentation!).
>
> I was not aware of the idea that traits and terms would be authored by
> different roles - thanks for your explanation. What is coming up for me 
is
> :
>
> I think business users should be able to add new business terms (maybe
> going through a workflow and a governance curator then sorting out
> inconsistencies), as they are the most expert as the language they use.
> Classifications could be authored by different teams, for example levels
> of confidentiality (in Mandy's example) would be dictated by the
> governance team. Governance rules would run on these classifications.
>
> You say "So, it is hard to use traits in a shared sense or expect to 
have
> conventional usage" . I notice the Atlas tutorial did not give me this
> impression, as the example of a trait/tag is PII.
> Your description of traits implies they are more like free form labels .
> If this is the intent for traits, then it does not make sense to rename
> them to classification. Maybe traits should be called labels; so their
> name is more in line with their expected usage. Though we should change
> the tutorial!
>
> A business term is a type of classification -a semantic classification. 
We
> could add in the concept of classification which Business term and
> Business category  (Jira 1186 ) inherit from. This would allow us to add
> in confidential classifications and classifications schemes to organize.
>
> I look forwards to your thoughts,
>       all the best, David.
>
>
>
>
> From:   Hemanth Yamijala <hyamijala@hortonworks.com>
> To:     David Radley <dev@atlas.incubator.apache.org>
> Date:   26/09/2016 05:33
> Subject:        Re: Rename trait to classification
>
>
>
> Hi,
>
> Are these de-facto industry terms in the governance industry? If yes,
> would they make more sense to explore as part of the Business Taxonomy
> feature that's currently in alpha in 0.7, rather than the trait system?
>
> One differentiation we've been trying to express is that traits (also
> referred to as tags in some places in Atlas) are free form and left to 
the
> user using them. So, it is hard to use traits in a shared sense or 
expect
> to have conventional usage. So, traits would probably be a tool for a 
data
> scientist to quickly annotate something for their own discovery usage
> later.
>
> Business taxonomy, on the other hand, is something we are thinking as 
used
> to express standard classification, even if only within an organization,
> but maybe even across industry domains etc. They would likely be created
> by data stewards with knowledge of the domain and their usage would 
follow
> established practices (authorization controlling who can do what).
>
> Not sure if what we're referring to as "classification" here fits the
> "traits" or "business taxonomy" side more - trying to understand...
>
> Thanks
> hemanth
> ________________________________________
> From: Mandy Chessell <mandy_chessell@uk.ibm.com>
> Sent: Sunday, September 25, 2016 9:56 PM
> To: David Radley
> Cc: dev@atlas.incubator.apache.org
> Subject: Re: Rename trait to classification
>
> Hello David,
> I also like the idea of using the term classification.
> Typically classifications in governance are ordered sets of values 
grouped
> into a classification scheme.  Is the notion of the classification 
scheme
> also part of the change you are thinking of?
>
> For example, the classification scheme and "unclassified" value which is
> the default classification for any data element that has no 
classification
> from this scheme associated with it.  The other values are defined in
> increasing levels of sensitivity.  There are also sub-classifications. 
So
> for example, confidential has sub-classifications of Business
> Confidential, Partner Confidential and Personal Confidential.  If a rule
> is defined for "confidential", it applies to all three of the
> sub-classifications.
>
> §Confidentiality Classification Scheme
> §Confidentiality is used to classify the impact of disclosing 
information
> to unauthorized individuals
> •Unclassified
> •Internal Use
> •Confidential
> •Business Confidential.
> •Partner Confidential.
> •Personal Information.
> •Sensitive
> •Sensitive Personal
> •Sensitive Financial
> •Sensitive Operational
> •Restricted
> •Restricted Financial
> •Restricted Operational
> •Trade Secret
>
>
> The classification schemes create a graduated view of how sensitive data
> is.  We would also expect to see classification schemes for other 
aspects
> of governance such as retention, confidence (quality) and criticality.
>
>
> All the best
> Mandy
> ___________________________________________
> Mandy Chessell CBE FREng CEng FBCS
> IBM Distinguished Engineer
> IBM Analytics Group CTO Office
>
> Master Inventor
> Member of the IBM Academy of Technology
> Visiting Professor, Department of Computer Science, University of
> Sheffield
>
> Email: mandy_chessell@uk.ibm.com
> LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49
>
> Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
>
>
>
> From:   David Radley/UK/IBM@IBMGB
> To:     dev@atlas.incubator.apache.org
> Date:   23/09/2016 17:05
> Subject:        Re: Rename trait to classification
>
>
>
> Hi Madhan,
> That would be great :-)  thanks, David.
>
>
>
> From:   Madhan Neethiraj <madhan@apache.org>
> To:     "dev@atlas.incubator.apache.org" 
<dev@atlas.incubator.apache.org>
> Date:   23/09/2016 16:48
> Subject:        Re: Rename trait to classification
> Sent by:        Madhan Neethiraj <mneethiraj@hortonworks.com>
>
>
>
> David,
>
> I agree on replacing ‘trait’ with ‘Classification’. I guess the name
> ‘triat’ might have been influenced by Scala (and not from Ranger, which
> doesn’t have ‘triat’ in its vocab..).
>
> Instead of renaming in the existing APIs, how about we go with the new
> name in the API introduced in ATLAS-1171?
>
> Thanks,
> Madhan
>
>
>
> On 9/23/16, 1:35 AM, "David Radley" <david_radley@uk.ibm.com> wrote:
>
>     Hi,
>     I have raised Jira ATLAS-1187. This is to rename trait to
> Classification.
>     I know that this would effect the API, so am keen to understand how 
we
>
>
>     agree to version the API maybe including other changes. I feel trait
> is
>     not very descriptive and I assume comes from Ranger terminology. I
> think
>     using classification instead brings us into using terminology better
>     representing the Atlas capability and its role in governance use
> cases. I
>     am keen to get your feedback. I do not feel that I should just 
submit
> a
>     fix like this - I think we need more agreement to account for the
> impact
>     on current users. At the same time, we are still in incubation we
> should
>     be able to make changes like this to polish the API.
>
>     I am looking forward to your thoughts,       David Radley
>     Unless stated otherwise above:
>     IBM United Kingdom Limited - Registered in England and Wales with
> number
>     741598.
>     Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire 
PO6
>
> 3AU
>
>
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
>
>
>
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
>




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message