atlas-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mad...@apache.org
Subject [07/10] incubator-atlas-website git commit: updated site for 0.8 release
Date Fri, 17 Mar 2017 05:32:52 GMT
http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/ff453c82/0.8.0-incubating/StormAtlasHook.html
----------------------------------------------------------------------
diff --git a/0.8.0-incubating/StormAtlasHook.html b/0.8.0-incubating/StormAtlasHook.html
new file mode 100644
index 0000000..d48fa8f
--- /dev/null
+++ b/0.8.0-incubating/StormAtlasHook.html
@@ -0,0 +1,316 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2017-03-16
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20170316" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Apache Atlas &#x2013; Storm Atlas Bridge</title>
+    <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="./css/site.css" />
+    <link rel="stylesheet" href="./css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel(
{ interval: 3500 } ) } );</script>
+          
+            </head>
+        <body class="topBarEnabled">
+          
+                        
+                    
+                
+
+    <div id="topbar" class="navbar navbar-fixed-top ">
+      <div class="navbar-inner">
+                                  <div class="container" style="width: 68%;"><div
class="nav-collapse">
+            
+                
+                                <ul class="nav">
+                          <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Atlas <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="index.html"  title="About">About</a>
+</li>
+                  
+                      <li>      <a href="https://cwiki.apache.org/confluence/display/ATLAS"
 title="Wiki">Wiki</a>
+</li>
+                  
+                      <li>      <a href="https://cwiki.apache.org/confluence/display/ATLAS"
 title="News">News</a>
+</li>
+                  
+                      <li>      <a href="https://git-wip-us.apache.org/repos/asf/incubator-atlas.git"
 title="Git">Git</a>
+</li>
+                  
+                      <li>      <a href="https://issues.apache.org/jira/browse/ATLAS"
 title="Jira">Jira</a>
+</li>
+                  
+                      <li>      <a href="https://cwiki.apache.org/confluence/display/ATLAS/PoweredBy"
 title="Powered by">Powered by</a>
+</li>
+                  
+                      <li>      <a href="http://blogs.apache.org/atlas/"  title="Blog">Blog</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Project Information
<b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="project-info.html"  title="Summary">Summary</a>
+</li>
+                  
+                      <li>      <a href="mail-lists.html"  title="Mailing Lists">Mailing
Lists</a>
+</li>
+                  
+                      <li>      <a href="http://webchat.freenode.net?channels=apacheatlas&uio=d4"
 title="IRC">IRC</a>
+</li>
+                  
+                      <li>      <a href="team-list.html"  title="Team">Team</a>
+</li>
+                  
+                      <li>      <a href="issue-tracking.html"  title="Issue Tracking">Issue
Tracking</a>
+</li>
+                  
+                      <li>      <a href="source-repository.html"  title="Source
Repository">Source Repository</a>
+</li>
+                  
+                      <li>      <a href="license.html"  title="License">License</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Releases <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="http://www.apache.org/dyn/closer.cgi/incubator/atlas/0.8.0-incubating/"
 title="0.8-incubating">0.8-incubating</a>
+</li>
+                  
+                      <li>      <a href="http://archive.apache.org/dist/incubator/atlas/0.7.1-incubating/"
 title="0.7.1-incubating">0.7.1-incubating</a>
+</li>
+                  
+                      <li>      <a href="http://archive.apache.org/dist/incubator/atlas/0.7.0-incubating/"
 title="0.7-incubating">0.7-incubating</a>
+</li>
+                  
+                      <li>      <a href="http://archive.apache.org/dist/incubator/atlas/0.6.0-incubating/"
 title="0.6-incubating">0.6-incubating</a>
+</li>
+                  
+                      <li>      <a href="http://archive.apache.org/dist/incubator/atlas/0.5.0-incubating/"
 title="0.5-incubating">0.5-incubating</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation <b
class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="0.8.0-incubating/index.html"  title="0.8-incubating">0.8-incubating</a>
+</li>
+                  
+                      <li>      <a href="0.7.1-incubating/index.html"  title="0.7.1-incubating">0.7.1-incubating</a>
+</li>
+                  
+                      <li>      <a href="0.7.0-incubating/index.html"  title="0.7-incubating">0.7-incubating</a>
+</li>
+                  
+                      <li>      <a href="0.6.0-incubating/index.html"  title="0.6-incubating">0.6-incubating</a>
+</li>
+                  
+                      <li>      <a href="0.5.0-incubating/index.html"  title="0.5-incubating">0.5-incubating</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">ASF <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="http://www.apache.org/foundation/how-it-works.html"
 title="How Apache Works">How Apache Works</a>
+</li>
+                  
+                      <li>      <a href="http://www.apache.org/foundation/"  title="Foundation">Foundation</a>
+</li>
+                  
+                      <li>      <a href="http://www.apache.org/foundation/sponsorship.html"
 title="Sponsoring Apache">Sponsoring Apache</a>
+</li>
+                  
+                      <li>      <a href="http://www.apache.org/foundation/thanks.html"
 title="Thanks">Thanks</a>
+</li>
+                          </ul>
+      </li>
+                  </ul>
+          
+                      <form id="search-form" action="http://www.google.com/search" method="get"
 class="navbar-search pull-right" >
+    
+  <input value="http://atlas.incubator.apache.org" name="sitesearch" type="hidden"/>
+  <input class="search-query" name="q" id="query" type="text" />
+</form>
+<script type="text/javascript" src="http://www.google.com/coop/cse/brand?form=search-form"></script>
+          
+                            
+            
+            
+            
+    <iframe src="http://www.facebook.com/plugins/like.php?href=http://atlas.incubator.apache.org/atlas-docs&send=false&layout=button_count&show-faces=false&action=like&colorscheme=dark"
+        scrolling="no" frameborder="0"
+        style="border:none; width:80px; height:20px; margin-top: 10px;"  class="pull-right"
></iframe>
+                        
+    <script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script>
+
+        <ul class="nav pull-right"><li style="margin-top: 10px;">
+    
+    <div class="g-plusone" data-href="http://atlas.incubator.apache.org/atlas-docs" data-size="medium"
 width="60px" align="right" ></div>
+
+        </li></ul>
+                              
+                   
+                      </div>
+          
+        </div>
+      </div>
+    </div>
+    
+        <div class="container">
+          <div id="banner">
+        <div class="pull-left">
+                                                  <a href=".." id="bannerLeft">
+                                                                                        
       <img src="images/atlas-logo.png"  alt="Apache Atlas" width="200px" height="45px"/>
+                </a>
+                      </div>
+        <div class="pull-right">                  <a href="http://incubator.apache.org"
id="bannerRight">
+                                                                                        
       <img src="images/apache-incubator-logo.png"  alt="Apache Incubator"/>
+                </a>
+      </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org" class="externalLink" title="Apache">
+        Apache</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="index.html" title="Atlas">
+        Atlas</a>
+        </li>
+      <li class="divider ">/</li>
+        <li class="">Storm Atlas Bridge</li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right">Last Published: 2017-03-16</li>
<li class="divider pull-right">|</li>
+              <li id="projectVersion" class="pull-right">Version: 0.8-incubating</li>
+            
+                            </ul>
+      </div>
+
+      
+                        
+        <div id="bodyColumn" >
+                                  
+            <div class="section">
+<h2><a name="Storm_Atlas_Bridge"></a>Storm Atlas Bridge</h2></div>
+<div class="section">
+<h3><a name="Introduction"></a>Introduction</h3>
+<p>Apache Storm is a distributed real-time computation system. Storm makes it easy
to reliably process unbounded streams of data, doing for real-time processing what Hadoop
did for batch processing. The process is essentially a DAG of nodes, which is called <b>topology</b>.</p>
+<p>Apache Atlas is a metadata repository that enables end-to-end data lineage, search
and associate business classification.</p>
+<p>The goal of this integration is to push the operational topology metadata along
with the underlying data source(s), target(s), derivation processes and any available business
context so Atlas can capture the lineage for this topology.</p>
+<p>There are 2 parts in this process detailed below:</p>
+<ul>
+<li>Data model to represent the concepts in Storm</li>
+<li>Storm Atlas Hook to update metadata in Atlas</li></ul></div>
+<div class="section">
+<h3><a name="Storm_Data_Model"></a>Storm Data Model</h3>
+<p>A data model is represented as Types in Atlas. It contains the descriptions of various
nodes in the topology graph, such as spouts and bolts and the corresponding producer and consumer
types.</p>
+<p>The following types are added in Atlas.</p>
+<p></p>
+<ul>
+<li>storm_topology - represents the coarse-grained topology. A storm_topology derives
from an Atlas Process type and hence can be used to inform Atlas about lineage.</li>
+<li>Following data sets are added - kafka_topic, jms_topic, hbase_table, hdfs_data_set.
These all derive from an Atlas Dataset type and hence form the end points of a lineage graph.</li>
+<li>storm_spout - Data Producer having outputs, typically Kafka, JMS</li>
+<li>storm_bolt - Data Consumer having inputs and outputs, typically Hive, HBase, HDFS,
etc.</li></ul>
+<p>The Storm Atlas hook auto registers dependent models like the Hive data model if
it finds that these are not known to the Atlas server.</p>
+<p>The data model for each of the types is described in the class definition at org.apache.atlas.storm.model.StormDataModel.</p></div>
+<div class="section">
+<h3><a name="Storm_Atlas_Hook"></a>Storm Atlas Hook</h3>
+<p>Atlas is notified when a new topology is registered successfully in Storm. Storm
provides a hook, backtype.storm.ISubmitterHook, at the Storm client used to submit a storm
topology.</p>
+<p>The Storm Atlas hook intercepts the hook post execution and extracts the metadata
from the topology and updates Atlas using the types defined. Atlas implements the Storm client
hook interface in org.apache.atlas.storm.hook.StormAtlasHook.</p></div>
+<div class="section">
+<h3><a name="Limitations"></a>Limitations</h3>
+<p>The following apply for the first version of the integration.</p>
+<p></p>
+<ul>
+<li>Only new topology submissions are registered with Atlas, any lifecycle changes
are not reflected in Atlas.</li>
+<li>The Atlas server needs to be online when a Storm topology is submitted for the
metadata to be captured.</li>
+<li>The Hook currently does not support capturing lineage for custom spouts and bolts.</li></ul></div>
+<div class="section">
+<h3><a name="Installation"></a>Installation</h3>
+<p>The Storm Atlas Hook needs to be manually installed in Storm on the client side.
The hook artifacts are available at: $ATLAS_PACKAGE/hook/storm</p>
+<p>Storm Atlas hook jars need to be copied to $STORM_HOME/extlib. Replace STORM_HOME
with storm installation path.</p>
+<p>Restart all daemons after you have installed the atlas hook into Storm.</p></div>
+<div class="section">
+<h3><a name="Configuration"></a>Configuration</h3></div>
+<div class="section">
+<h4><a name="Storm_Configuration"></a>Storm Configuration</h4>
+<p>The Storm Atlas Hook needs to be configured in Storm client config in <b>$STORM_HOME/conf/storm.yaml</b>
as:</p>
+<div class="source">
+<pre>
+storm.topology.submission.notifier.plugin.class: &quot;org.apache.atlas.storm.hook.StormAtlasHook&quot;
+
+</pre></div>
+<p>Also set a 'cluster name' that would be used as a namespace for objects registered
in Atlas. This name would be used for namespacing the Storm topology, spouts and bolts.</p>
+<p>The other objects like data sets should ideally be identified with the cluster name
of the components that generate them. For e.g. Hive tables and databases should be identified
using the cluster name set in Hive. The Storm Atlas hook will pick this up if the Hive configuration
is available in the Storm topology jar that is submitted on the client and the cluster name
is defined there. This happens similarly for HBase data sets. In case this configuration is
not available, the cluster name set in the Storm configuration will be used.</p>
+<div class="source">
+<pre>
+atlas.cluster.name: &quot;cluster_name&quot;
+
+</pre></div>
+<p>In <b>$STORM_HOME/conf/storm_env.ini</b>, set an environment variable
as follows:</p>
+<div class="source">
+<pre>
+STORM_JAR_JVM_OPTS:&quot;-Datlas.conf=$ATLAS_HOME/conf/&quot;
+
+</pre></div>
+<p>where ATLAS_HOME is pointing to where ATLAS is installed.</p>
+<p>You could also set this up programatically in Storm Config as:</p>
+<div class="source">
+<pre>
+    Config stormConf = new Config();
+    ...
+    stormConf.put(Config.STORM_TOPOLOGY_SUBMISSION_NOTIFIER_PLUGIN,
+            org.apache.atlas.storm.hook.StormAtlasHook.class.getName());
+
+</pre></div></div>
+                  </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container">
+              <div class="row span12">Copyright &copy;                    2015-2017
+                        <a href="http://www.apache.org">Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                          
+                <p id="poweredBy" class="pull-right">
+                          <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png"
/>
+      </a>
+              </p>
+        
+                </div>
+    </footer>
+  </body>
+</html>

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/ff453c82/0.8.0-incubating/TypeSystem.html
----------------------------------------------------------------------
diff --git a/0.8.0-incubating/TypeSystem.html b/0.8.0-incubating/TypeSystem.html
new file mode 100644
index 0000000..c34833c
--- /dev/null
+++ b/0.8.0-incubating/TypeSystem.html
@@ -0,0 +1,406 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2017-03-16
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20170316" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Apache Atlas &#x2013; Type System</title>
+    <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="./css/site.css" />
+    <link rel="stylesheet" href="./css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel(
{ interval: 3500 } ) } );</script>
+          
+            </head>
+        <body class="topBarEnabled">
+          
+                        
+                    
+                
+
+    <div id="topbar" class="navbar navbar-fixed-top ">
+      <div class="navbar-inner">
+                                  <div class="container" style="width: 68%;"><div
class="nav-collapse">
+            
+                
+                                <ul class="nav">
+                          <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Atlas <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="index.html"  title="About">About</a>
+</li>
+                  
+                      <li>      <a href="https://cwiki.apache.org/confluence/display/ATLAS"
 title="Wiki">Wiki</a>
+</li>
+                  
+                      <li>      <a href="https://cwiki.apache.org/confluence/display/ATLAS"
 title="News">News</a>
+</li>
+                  
+                      <li>      <a href="https://git-wip-us.apache.org/repos/asf/incubator-atlas.git"
 title="Git">Git</a>
+</li>
+                  
+                      <li>      <a href="https://issues.apache.org/jira/browse/ATLAS"
 title="Jira">Jira</a>
+</li>
+                  
+                      <li>      <a href="https://cwiki.apache.org/confluence/display/ATLAS/PoweredBy"
 title="Powered by">Powered by</a>
+</li>
+                  
+                      <li>      <a href="http://blogs.apache.org/atlas/"  title="Blog">Blog</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Project Information
<b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="project-info.html"  title="Summary">Summary</a>
+</li>
+                  
+                      <li>      <a href="mail-lists.html"  title="Mailing Lists">Mailing
Lists</a>
+</li>
+                  
+                      <li>      <a href="http://webchat.freenode.net?channels=apacheatlas&uio=d4"
 title="IRC">IRC</a>
+</li>
+                  
+                      <li>      <a href="team-list.html"  title="Team">Team</a>
+</li>
+                  
+                      <li>      <a href="issue-tracking.html"  title="Issue Tracking">Issue
Tracking</a>
+</li>
+                  
+                      <li>      <a href="source-repository.html"  title="Source
Repository">Source Repository</a>
+</li>
+                  
+                      <li>      <a href="license.html"  title="License">License</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Releases <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="http://www.apache.org/dyn/closer.cgi/incubator/atlas/0.8.0-incubating/"
 title="0.8-incubating">0.8-incubating</a>
+</li>
+                  
+                      <li>      <a href="http://archive.apache.org/dist/incubator/atlas/0.7.1-incubating/"
 title="0.7.1-incubating">0.7.1-incubating</a>
+</li>
+                  
+                      <li>      <a href="http://archive.apache.org/dist/incubator/atlas/0.7.0-incubating/"
 title="0.7-incubating">0.7-incubating</a>
+</li>
+                  
+                      <li>      <a href="http://archive.apache.org/dist/incubator/atlas/0.6.0-incubating/"
 title="0.6-incubating">0.6-incubating</a>
+</li>
+                  
+                      <li>      <a href="http://archive.apache.org/dist/incubator/atlas/0.5.0-incubating/"
 title="0.5-incubating">0.5-incubating</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation <b
class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="0.8.0-incubating/index.html"  title="0.8-incubating">0.8-incubating</a>
+</li>
+                  
+                      <li>      <a href="0.7.1-incubating/index.html"  title="0.7.1-incubating">0.7.1-incubating</a>
+</li>
+                  
+                      <li>      <a href="0.7.0-incubating/index.html"  title="0.7-incubating">0.7-incubating</a>
+</li>
+                  
+                      <li>      <a href="0.6.0-incubating/index.html"  title="0.6-incubating">0.6-incubating</a>
+</li>
+                  
+                      <li>      <a href="0.5.0-incubating/index.html"  title="0.5-incubating">0.5-incubating</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">ASF <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="http://www.apache.org/foundation/how-it-works.html"
 title="How Apache Works">How Apache Works</a>
+</li>
+                  
+                      <li>      <a href="http://www.apache.org/foundation/"  title="Foundation">Foundation</a>
+</li>
+                  
+                      <li>      <a href="http://www.apache.org/foundation/sponsorship.html"
 title="Sponsoring Apache">Sponsoring Apache</a>
+</li>
+                  
+                      <li>      <a href="http://www.apache.org/foundation/thanks.html"
 title="Thanks">Thanks</a>
+</li>
+                          </ul>
+      </li>
+                  </ul>
+          
+                      <form id="search-form" action="http://www.google.com/search" method="get"
 class="navbar-search pull-right" >
+    
+  <input value="http://atlas.incubator.apache.org" name="sitesearch" type="hidden"/>
+  <input class="search-query" name="q" id="query" type="text" />
+</form>
+<script type="text/javascript" src="http://www.google.com/coop/cse/brand?form=search-form"></script>
+          
+                            
+            
+            
+            
+    <iframe src="http://www.facebook.com/plugins/like.php?href=http://atlas.incubator.apache.org/atlas-docs&send=false&layout=button_count&show-faces=false&action=like&colorscheme=dark"
+        scrolling="no" frameborder="0"
+        style="border:none; width:80px; height:20px; margin-top: 10px;"  class="pull-right"
></iframe>
+                        
+    <script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script>
+
+        <ul class="nav pull-right"><li style="margin-top: 10px;">
+    
+    <div class="g-plusone" data-href="http://atlas.incubator.apache.org/atlas-docs" data-size="medium"
 width="60px" align="right" ></div>
+
+        </li></ul>
+                              
+                   
+                      </div>
+          
+        </div>
+      </div>
+    </div>
+    
+        <div class="container">
+          <div id="banner">
+        <div class="pull-left">
+                                                  <a href=".." id="bannerLeft">
+                                                                                        
       <img src="images/atlas-logo.png"  alt="Apache Atlas" width="200px" height="45px"/>
+                </a>
+                      </div>
+        <div class="pull-right">                  <a href="http://incubator.apache.org"
id="bannerRight">
+                                                                                        
       <img src="images/apache-incubator-logo.png"  alt="Apache Incubator"/>
+                </a>
+      </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org" class="externalLink" title="Apache">
+        Apache</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="index.html" title="Atlas">
+        Atlas</a>
+        </li>
+      <li class="divider ">/</li>
+        <li class="">Type System</li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right">Last Published: 2017-03-16</li>
<li class="divider pull-right">|</li>
+              <li id="projectVersion" class="pull-right">Version: 0.8-incubating</li>
+            
+                            </ul>
+      </div>
+
+      
+                        
+        <div id="bodyColumn" >
+                                  
+            <div class="section">
+<h2><a name="Type_System"></a>Type System</h2></div>
+<div class="section">
+<h3><a name="Overview"></a>Overview</h3>
+<p>Atlas allows users to define a model for the metadata objects they want to manage.
The model is composed of definitions called &#x2018;types&#x2019;. Instances of &#x2018;types&#x2019;
called &#x2018;entities&#x2019; represent the actual metadata objects that are managed.
The Type System is a component that allows users to define and manage the types and entities.
All metadata objects managed by Atlas out of the box (like Hive tables, for e.g.) are modelled
using types and represented as entities. To store new types of metadata in Atlas, one needs
to understand the concepts of the type system component.</p></div>
+<div class="section">
+<h3><a name="Types"></a>Types</h3>
+<p>A &#x2018;Type&#x2019; in Atlas is a definition of how a particular type
of metadata objects are stored and accessed. A type represents one or a collection of attributes
that define the properties for the metadata object. Users with a development background will
recognize the similarity of a type to a &#x2018;Class&#x2019; definition of object
oriented programming languages, or a &#x2018;table schema&#x2019; of relational databases.</p>
+<p>An example of a type that comes natively defined with Atlas is a Hive table. A Hive
table is defined with these attributes:</p>
+<div class="source">
+<pre>
+Name: hive_table
+MetaType: Class
+SuperTypes: DataSet
+Attributes:
+    name: String (name of the table)
+    db: Database object of type hive_db
+    owner: String
+    createTime: Date
+    lastAccessTime: Date
+    comment: String
+    retention: int
+    sd: Storage Description object of type hive_storagedesc
+    partitionKeys: Array of objects of type hive_column
+    aliases: Array of strings
+    columns: Array of objects of type hive_column
+    parameters: Map of String keys to String values
+    viewOriginalText: String
+    viewExpandedText: String
+    tableType: String
+    temporary: Boolean
+
+</pre></div>
+<p>The following points can be noted from the above example:</p>
+<p></p>
+<ul>
+<li>A type in Atlas is identified uniquely by a &#x2018;name&#x2019;</li>
+<li>A type has a metatype. A metatype represents the type of this model in Atlas. Atlas
has the following metatypes:
+<ul>
+<li>Basic metatypes: E.g. Int, String, Boolean etc.</li>
+<li>Enum metatypes</li>
+<li>Collection metatypes: E.g. Array, Map</li>
+<li>Composite metatypes: E.g. Class, Struct, Trait</li></ul></li>
+<li>A type can &#x2018;extend&#x2019; from a parent type called &#x2018;supertype&#x2019;
- by virtue of this, it will get to include the attributes that are defined in the supertype
as well. This allows modellers to define common attributes across a set of related types etc.
This is again similar to the concept of how Object Oriented languages define super classes
for a class. It is also possible for a type in Atlas to extend from multiple super types.
+<ul>
+<li>In this example, every hive table extends from a pre-defined supertype called a
&#x2018;DataSet&#x2019;. More details about this pre-defined types will be provided
later.</li></ul></li>
+<li>Types which have a metatype of &#x2018;Class&#x2019;, &#x2018;Struct&#x2019;
or &#x2018;Trait&#x2019; can have a collection of attributes. Each attribute has a
name (e.g.  &#x2018;name&#x2019;) and some other associated properties. A property
can be referred to using an expression type_name.attribute_name. It is also good to note that
attributes themselves are defined using Atlas metatypes.
+<ul>
+<li>In this example, hive_table.name is a String, hive_table.aliases is an array of
Strings, hive_table.db refers to an instance of a type called hive_db and so on.</li></ul></li>
+<li>Type references in attributes, (like hive_table.db) are particularly interesting.
Note that using such an attribute, we can define arbitrary relationships between two types
defined in Atlas and thus build rich models. Note that one can also collect a list of references
as an attribute type (e.g. hive_table.cols which represents a list of references from hive_table
to the hive_column type)</li></ul></div>
+<div class="section">
+<h3><a name="Entities"></a>Entities</h3>
+<p>An &#x2018;entity&#x2019; in Atlas is a specific value or instance of a
Class &#x2018;type&#x2019; and thus represents a specific metadata object in the real
world. Referring back to our analogy of Object Oriented Programming languages, an &#x2018;instance&#x2019;
is an &#x2018;Object&#x2019; of a certain &#x2018;Class&#x2019;.</p>
+<p>An example of an entity will be a specific Hive Table. Say Hive has a table called
&#x2018;customers&#x2019; in the &#x2018;default&#x2019; database. This table
will be an &#x2018;entity&#x2019; in Atlas of type hive_table. By virtue of being
an instance of a class type, it will have values for every attribute that are a part of the
Hive table &#x2018;type&#x2019;, such as:</p>
+<div class="source">
+<pre>
+id: &quot;9ba387dd-fa76-429c-b791-ffc338d3c91f&quot;
+typeName: &#x201c;hive_table&#x201d;
+values:
+    name: &#x201c;customers&#x201d;
+    db: &quot;b42c6cfc-c1e7-42fd-a9e6-890e0adf33bc&quot;
+    owner: &#x201c;admin&#x201d;
+    createTime: &quot;2016-06-20T06:13:28.000Z&quot;
+    lastAccessTime: &quot;2016-06-20T06:13:28.000Z&quot;
+    comment: null
+    retention: 0
+    sd: &quot;ff58025f-6854-4195-9f75-3a3058dd8dcf&quot;
+    partitionKeys: null
+    aliases: null
+    columns: [&quot;65e2204f-6a23-4130-934a-9679af6a211f&quot;, &quot;d726de70-faca-46fb-9c99-cf04f6b579a6&quot;,
...]
+    parameters: {&quot;transient_lastDdlTime&quot;: &quot;1466403208&quot;}
+    viewOriginalText: null
+    viewExpandedText: null
+    tableType: &#x201c;MANAGED_TABLE&#x201d;
+    temporary: false
+
+</pre></div>
+<p>The following points can be noted from the example above:</p>
+<p></p>
+<ul>
+<li>Every entity that is an instance of a Class type is identified by a unique identifier,
a GUID. This GUID is generated by the Atlas server when the object is defined, and remains
constant for the entire lifetime of the entity. At any point in time, this particular entity
can be accessed using its GUID.
+<ul>
+<li>In this example, the &#x2018;customers&#x2019; table in the default database
is uniquely identified by the GUID &quot;9ba387dd-fa76-429c-b791-ffc338d3c91f&quot;</li></ul></li>
+<li>An entity is of a given type, and the name of the type is provided with the entity
definition.
+<ul>
+<li>In this example, the &#x2018;customers&#x2019; table is a &#x2018;hive_table.</li></ul></li>
+<li>The values of this entity are a map of all the attribute names and their values
for attributes that are defined in the hive_table type definition.</li>
+<li>Attribute values will be according to the metatype of the attribute.
+<ul>
+<li>Basic metatypes: integer, String, boolean values. E.g. &#x2018;name&#x2019;
= &#x2018;customers&#x2019;, &#x2018;Temporary&#x2019; = &#x2018;false&#x2019;</li>
+<li>Collection metatypes: An array or map of values of the contained metatype. E.g.
parameters = { &#x201c;transient_lastDdlTime&#x201d;: &#x201c;1466403208&#x201d;}</li>
+<li>Composite metatypes: For classes, the value will be an entity with which this particular
entity will have a relationship. E.g. The hive table &#x201c;customers&#x201d; is
present in a database called &#x201c;default&#x201d;. The relationship between the
table and database are captured via the &#x201c;db&#x201d; attribute. Hence, the value
of the &#x201c;db&#x201d; attribute will be a GUID that uniquely identifies the hive_db
entity called &#x201c;default&#x201d;</li></ul></li></ul>
+<p>With this idea on entities, we can now see the difference between Class and Struct
metatypes. Classes and Structs both compose attributes of other types. However, entities of
Class types have the Id attribute (with a GUID value) a nd can be referenced from other entities
(like a hive_db entity is referenced from a hive_table entity). Instances of Struct types
do not have an identity of their own. The value of a Struct type is a collection of attributes
that are &#x2018;embedded&#x2019; inside the entity itself.</p></div>
+<div class="section">
+<h3><a name="Attributes"></a>Attributes</h3>
+<p>We already saw that attributes are defined inside composite metatypes like Class
and Struct. But we simplistically referred to attributes as having a name and a metatype value.
However, attributes in Atlas have some more properties that define more concepts related to
the type system.</p>
+<p>An attribute has the following properties:</p>
+<div class="source">
+<pre>
+    name: string,
+    dataTypeName: string,
+    isComposite: boolean,
+    isIndexable: boolean,
+    isUnique: boolean,
+    multiplicity: enum,
+    reverseAttributeName: string
+
+</pre></div>
+<p>The properties above have the following meanings:</p>
+<p></p>
+<ul>
+<li>name - the name of the attribute</li>
+<li>dataTypeName - the metatype name of the attribute (native, collection or composite)</li>
+<li>isComposite -
+<ul>
+<li>This flag indicates an aspect of modelling. If an attribute is defined as composite,
it means that it cannot have a lifecycle independent of the entity it is contained in. A good
example of this concept is the set of columns that make a part of a hive table. Since the
columns do not have meaning outside of the hive table, they are defined as composite attributes.</li>
+<li>A composite attribute must be created in Atlas along with the entity it is contained
in. i.e. A hive column must be created along with the hive table.</li></ul></li>
+<li>isIndexable -
+<ul>
+<li>This flag indicates whether this property should be indexed on, so that look ups
can be performed using the attribute value as a predicate and can be performed efficiently.</li></ul></li>
+<li>isUnique -
+<ul>
+<li>This flag is again related to indexing. If specified to be unique, it means that
a special index is created for this attribute in Titan that allows for equality based look
ups.</li>
+<li>Any attribute with a true value for this flag is treated like a primary key to
distinguish this entity from other entities. Hence care should be taken ensure that this attribute
does model a unique property in real world.
+<ul>
+<li>For e.g. consider the name attribute of a hive_table. In isolation, a name is not
a unique attribute for a hive_table, because tables with the same name can exist in multiple
databases. Even a pair of (database name, table name) is not unique if Atlas is storing metadata
of hive tables amongst multiple clusters. Only a cluster location, database name and table
name can be deemed unique in the physical world.</li></ul></li></ul></li>
+<li>multiplicity - indicates whether this attribute is required, optional, or could
be multi-valued. If an entity&#x2019;s definition of the attribute value does not match
the multiplicity declaration in the type definition, this would be a constraint violation
and the entity addition will fail. This field can therefore be used to define some constraints
on the metadata information.</li></ul>
+<p>Using the above, let us expand on the attribute definition of one of the attributes
of the hive table below. Let us look at the attribute called &#x2018;db&#x2019; which
represents the database to which the hive table belongs:</p>
+<div class="source">
+<pre>
+db:
+    &quot;dataTypeName&quot;: &quot;hive_db&quot;,
+    &quot;isComposite&quot;: false,
+    &quot;isIndexable&quot;: true,
+    &quot;isUnique&quot;: false,
+    &quot;multiplicity&quot;: &quot;required&quot;,
+    &quot;name&quot;: &quot;db&quot;,
+    &quot;reverseAttributeName&quot;: null
+
+</pre></div>
+<p>Note the &#x201c;required&#x201d; constraint on multiplicity. A table entity
cannot be sent without a db reference.</p>
+<div class="source">
+<pre>
+columns:
+    &quot;dataTypeName&quot;: &quot;array&lt;hive_column&gt;&quot;,
+    &quot;isComposite&quot;: true,
+    &quot;isIndexable&quot;: true,
+    &#x201c;isUnique&quot;: false,
+    &quot;multiplicity&quot;: &quot;optional&quot;,
+    &quot;name&quot;: &quot;columns&quot;,
+    &quot;reverseAttributeName&quot;: null
+
+</pre></div>
+<p>Note the &#x201c;isComposite&#x201d; true value for columns. By doing this,
we are indicating that the defined column entities should always be bound to the table entity
they are defined with.</p>
+<p>From this description and examples, you will be able to realize that attribute definitions
can be used to influence specific modelling behavior (constraints, indexing, etc) to be enforced
by the Atlas system.</p></div>
+<div class="section">
+<h3><a name="System_specific_types_and_their_significance"></a>System specific
types and their significance</h3>
+<p>Atlas comes with a few pre-defined system types. We saw one example (DataSet) in
the preceding sections. In this section we will see all these types and understand their significance.</p>
+<p><b>Referenceable</b>: This type represents all entities that can be
searched for using a unique attribute called qualifiedName.</p>
+<p><b>Asset</b>: This type contains attributes like name, description and
owner. Name is a required attribute (multiplicity = required), the others are optional. The
purpose of Referenceable and Asset is to provide modellers with way to enforce consistency
when defining and querying entities of their own types. Having these fixed set of attributes
allows applications and User interfaces to make convention based assumptions about what attributes
they can expect of types by default.</p>
+<p><b>Infrastructure</b>: This type extends Referenceable and Asset and
typically can be used to be a common super type for infrastructural metadata objects like
clusters, hosts etc.</p>
+<p><b>DataSet</b>: This type extends Referenceable and Asset. Conceptually,
it can be used to represent an type that stores data. In Atlas, hive tables, Sqoop RDBMS tables
etc are all types that extend from DataSet. Types that extend DataSet can be expected to have
a Schema in the sense that they would have an attribute that defines attributes of that dataset.
For e.g. the columns attribute in a hive_table. Also entities of types that extend DataSet
participate in data transformation and this transformation can be captured by Atlas via lineage
(or provenance) graphs.</p>
+<p><b>Process</b>: This type extends Referenceable and Asset. Conceptually,
it can be used to represent any data transformation operation. For example, an ETL process
that transforms a hive table with raw data to another hive table that stores some aggregate
can be a specific type that extends the Process type. A Process type has two specific attributes,
inputs and outputs. Both  inputs and outputs are arrays of DataSet entities. Thus an instance
of a Process type can use these inputs and outputs to capture how the lineage of a DataSet
evolves.</p></div>
+                  </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container">
+              <div class="row span12">Copyright &copy;                    2015-2017
+                        <a href="http://www.apache.org">Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                          
+                <p id="poweredBy" class="pull-right">
+                          <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png"
/>
+      </a>
+              </p>
+        
+                </div>
+    </footer>
+  </body>
+</html>



Mime
View raw message