atlas-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shweth...@apache.org
Subject [2/2] incubator-atlas-website git commit: Updated latest site
Date Mon, 25 Apr 2016 03:43:27 GMT
Updated latest site


Project: http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/commit/a876d178
Tree: http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/tree/a876d178
Diff: http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/diff/a876d178

Branch: refs/heads/asf-site
Commit: a876d1782a9a2b5380ed1729dcd3407d12e119fe
Parents: 02b2068
Author: Shwetha GS <sshivalingamurthy@hortonworks.com>
Authored: Mon Apr 25 09:13:18 2016 +0530
Committer: Shwetha GS <sshivalingamurthy@hortonworks.com>
Committed: Mon Apr 25 09:13:18 2016 +0530

----------------------------------------------------------------------
 Architecture.html                |  13 +-
 Bridge-Falcon.html               |  15 +-
 Bridge-Hive.html                 |  14 +-
 Bridge-Sqoop.html                |  10 +-
 Configuration.html               |  63 ++++++-
 HighAvailability.html            | 120 ++++++++++++--
 InstallationSteps.html           |  85 +++++++---
 Notification-Entity.html         |   6 +-
 QuickStart.html                  |   6 +-
 Repository.html                  |   6 +-
 Search.html                      |  14 +-
 Security.html                    |  11 +-
 StormAtlasHook.html              | 298 ++++++++++++++++++++++++++++++++++
 TypeSystem.html                  |  18 +-
 api/application.wadl             |  73 +++++++++
 api/resource_AdminResource.html  |  19 +++
 api/resource_EntityResource.html | 120 ++++++++++++++
 index.html                       |   9 +-
 issue-tracking.html              |   6 +-
 license.html                     |   6 +-
 mail-lists.html                  |   6 +-
 project-info.html                |   6 +-
 source-repository.html           |  12 +-
 team-list.html                   |  55 ++++---
 24 files changed, 851 insertions(+), 140 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/Architecture.html
----------------------------------------------------------------------
diff --git a/Architecture.html b/Architecture.html
index 461d55c..480aae0 100644
--- a/Architecture.html
+++ b/Architecture.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Architecture</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>
@@ -217,17 +217,18 @@
 <li><b>Notification Server</b>: Atlas uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata notification events. Events are written by the hooks and Atlas to different Kafka topics. Kafka enables a loosely coupled integration between these disparate systems.</li></ul></div>
 <div class="section">
 <h3><a name="Bridges"></a>Bridges</h3>
-<p>External components like hive/sqoop/storm/falcon should model their taxonomy using typesystem and register the types with Atlas. For every entity created in this external component, the corresponding entity should be registered in Atlas as well. This is typically done in a hook which runs in the external component and is called for every entity operation. Hook generally processes the entity asynchronously using a thread pool to avoid adding latency to the main operation. The hook can then build the entity and register the entity using Atlas REST APIs. Howerver, any failure in APIs because of network issue etc can in result entity not registered in Atlas and hence inconsistent metadata.</p>
+<p>External components like hive/sqoop/storm/falcon should model their taxonomy using typesystem and register the types with Atlas. For every entity created in this external component, the corresponding entity should be registered in Atlas as well. This is typically done in a hook which runs in the external component and is called for every entity operation. Hook generally processes the entity asynchronously using a thread pool to avoid adding latency to the main operation. The hook can then build the entity and register the entity using Atlas REST APIs. Howerver, any failure in APIs because of network issue etc can result in entity not registered in Atlas and hence inconsistent metadata.</p>
 <p>Atlas exposes notification interface and can be used for reliable entity registration by hook as well. The hook can send notification message containing the list of entities to be registered.  Atlas service contains hook consumer that listens to these messages and registers the entities.</p>
 <p>Available bridges are:</p>
 <ul>
 <li><a href="./Bridge-Hive.html">Hive Bridge</a></li>
 <li><a href="./Bridge-Sqoop.html">Sqoop Bridge</a></li>
-<li><a href="./Bridge-Falcon.html">Falcon Bridge</a></li></ul></div>
+<li><a href="./Bridge-Falcon.html">Falcon Bridge</a></li>
+<li><a href="./StormAtlasHook.html">Storm Bridge</a></li></ul></div>
 <div class="section">
 <h3><a name="Notification"></a>Notification</h3>
 <p>Notification is used for reliable entity registration from hooks and for entity/type change notifications. Atlas, by default, provides Kafka integration, but its possible to provide other implementations as well. Atlas service starts embedded Kafka server by default.</p>
-<p>Atlas also provides <a href="./NotificationHookConsumer.html">NotificationHookConsumer</a> that runs in Atlas Service and listens to messages from hook and registers the entities in Atlas. <img src="images/twiki/notification.png" alt="" /></p></div>
+<p>Atlas also provides NotificationHookConsumer that runs in Atlas Service and listens to messages from hook and registers the entities in Atlas. <img src="images/twiki/notification.png" alt="" /></p></div>
                   </div>
           </div>
 

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/Bridge-Falcon.html
----------------------------------------------------------------------
diff --git a/Bridge-Falcon.html b/Bridge-Falcon.html
index bcf43c9..df7f952 100644
--- a/Bridge-Falcon.html
+++ b/Bridge-Falcon.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Falcon Atlas Bridge</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>
@@ -219,8 +219,13 @@ falcon_process(ClassType) - super types [Process] - attributes [timestamp, owned
 <ul>
 <li>Add 'org.apache.falcon.atlas.service.AtlasService' to application.services in &lt;falcon-conf&gt;/startup.properties</li>
 <li>Link falcon hook jars in falcon classpath - 'ln -s &lt;atlas-home&gt;/hook/falcon/* &lt;falcon-home&gt;/server/webapp/falcon/WEB-INF/lib/'</li>
-<li>Copy &lt;atlas-conf&gt;/client.properties and &lt;atlas-conf&gt;/atlas-application.properties to the falcon conf directory.</li></ul>
-<p>The following properties in &lt;atlas-conf&gt;/client.properties control the thread pool and notification details:</p>
+<li>In &lt;falcon_conf&gt;/falcon-env.sh, set an environment variable as follows:</li></ul>
+<div class="source">
+<pre>
+     export FALCON_SERVER_OPTS=&quot;$FALCON_SERVER_OPTS -Datlas.conf=&lt;atlas-conf&gt;&quot;
+     
+</pre></div>
+<p>The following properties in &lt;atlas-conf&gt;/atlas-application.properties control the thread pool and notification details:</p>
 <ul>
 <li>atlas.hook.falcon.synchronous - boolean, true to run the hook synchronously. default false</li>
 <li>atlas.hook.falcon.numRetries - number of retries for notification failure. default 3</li>

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/Bridge-Hive.html
----------------------------------------------------------------------
diff --git a/Bridge-Hive.html b/Bridge-Hive.html
index c725b1a..95d391a 100644
--- a/Bridge-Hive.html
+++ b/Bridge-Hive.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Hive Atlas Bridge</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>
@@ -231,7 +231,7 @@ hive_process(ClassType) - super types [Process] - attributes [startTime, endTime
 <li>hive_process - attribute name - &lt;queryString&gt; - trimmed query string in lower case</li></ul></div>
 <div class="section">
 <h3><a name="Importing_Hive_Metadata"></a>Importing Hive Metadata</h3>
-<p>org.apache.atlas.hive.bridge.HiveMetaStoreBridge imports the hive metadata into Atlas using the model defined in org.apache.atlas.hive.model.HiveDataModelGenerator. import-hive.sh command can be used to facilitate this. Set the following configuration in &lt;atlas-conf&gt;/client.properties and set environment variable $HIVE_CONF_DIR to the hive conf directory:</p>
+<p>org.apache.atlas.hive.bridge.HiveMetaStoreBridge imports the hive metadata into Atlas using the model defined in org.apache.atlas.hive.model.HiveDataModelGenerator. import-hive.sh command can be used to facilitate this. Set the following configuration in &lt;atlas-conf&gt;/atlas-application.properties and set environment variable $HIVE_CONF_DIR to the hive conf directory:</p>
 <div class="source">
 <pre>
     &lt;property&gt;
@@ -270,8 +270,8 @@ hive_process(ClassType) - super types [Process] - attributes [startTime, endTime
 <p></p>
 <ul>
 <li>Add 'export HIVE_AUX_JARS_PATH=&lt;atlas package&gt;/hook/hive' in hive-env.sh of your hive configuration</li>
-<li>Copy &lt;atlas-conf&gt;/client.properties and &lt;atlas-conf&gt;/atlas-application.properties to the hive conf directory.</li></ul>
-<p>The following properties in &lt;atlas-conf&gt;/client.properties control the thread pool and notification details:</p>
+<li>Copy &lt;atlas-conf&gt;/atlas-application.properties to the hive conf directory.</li></ul>
+<p>The following properties in &lt;atlas-conf&gt;/atlas-application.properties control the thread pool and notification details:</p>
 <ul>
 <li>atlas.hook.hive.synchronous - boolean, true to run the hook synchronously. default false</li>
 <li>atlas.hook.hive.numRetries - number of retries for notification failure. default 3</li>
@@ -285,7 +285,7 @@ hive_process(ClassType) - super types [Process] - attributes [startTime, endTime
 <p></p>
 <ul>
 <li>Since database name, table name and column names are case insensitive in hive, the corresponding names in entities are lowercase. So, any search APIs should use lowercase while querying on the entity names</li>
-<li>Only the following hive operations are captured by hive hook currently - create database, create table, create view, CTAS, load, import, export, query, alter table rename and alter view rename</li></ul></div>
+<li>Only the following hive operations are captured by hive hook currently - create database, create table, create view, CTAS, load, import, export, query, alter database, alter table(except alter table replace columns and alter table change column position), alter view (except replacing and changing column position)</li></ul></div>
                   </div>
           </div>
 

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/Bridge-Sqoop.html
----------------------------------------------------------------------
diff --git a/Bridge-Sqoop.html b/Bridge-Sqoop.html
index 9fa6414..7e9764f 100644
--- a/Bridge-Sqoop.html
+++ b/Bridge-Sqoop.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Sqoop Atlas Bridge</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>
@@ -215,14 +215,14 @@ sqoop_dbdatastore(ClassType) - super types [DataSet] - attributes [name, dbStore
 <p>The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying as well: sqoop_process - attribute name - sqoop-dbStoreType-storeUri-endTime sqoop_dbdatastore - attribute name - dbStoreType-connectorUrl-source</p></div>
 <div class="section">
 <h3><a name="Sqoop_Hook"></a>Sqoop Hook</h3>
-<p>Sqoop added a <a href="./SqoopJobDataPublisher.html">SqoopJobDataPublisher</a> that publishes data to Atlas after completion of import Job. Today, only hiveImport is supported in sqoopHook. This is used to add entities in Atlas using the model defined in org.apache.atlas.sqoop.model.SqoopDataModelGenerator. Follow these instructions in your sqoop set-up to add sqoop hook for Atlas in &lt;sqoop-conf&gt;/sqoop-site.xml:</p>
+<p>Sqoop added a SqoopJobDataPublisher that publishes data to Atlas after completion of import Job. Today, only hiveImport is supported in sqoopHook. This is used to add entities in Atlas using the model defined in org.apache.atlas.sqoop.model.SqoopDataModelGenerator. Follow these instructions in your sqoop set-up to add sqoop hook for Atlas in &lt;sqoop-conf&gt;/sqoop-site.xml:</p>
 <p></p>
 <ul>
 <li>Sqoop Job publisher class.  Currently only one publishing class is supported</li></ul><property>      <name>sqoop.job.data.publish.class</name>      <value>org.apache.atlas.sqoop.hook.SqoopHook</value>    </property>
 <ul>
 <li>Atlas cluster name</li></ul><property>      <name>atlas.cluster.name</name>      <value><clustername></value>    </property>
 <ul>
-<li>Copy &lt;atlas-conf&gt;/atlas-application.properties and &lt;atlas-conf&gt;/client.properties to to the sqoop conf directory &lt;sqoop-conf&gt;/</li>
+<li>Copy &lt;atlas-conf&gt;/atlas-application.properties to to the sqoop conf directory &lt;sqoop-conf&gt;/</li>
 <li>Link &lt;atlas-home&gt;/hook/sqoop/*.jar in sqoop lib</li></ul>
 <p>Refer <a href="./Configuration.html">Configuration</a> for notification related configurations</p></div>
 <div class="section">

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/Configuration.html
----------------------------------------------------------------------
diff --git a/Configuration.html b/Configuration.html
index eecc8e7..b8f74c5 100644
--- a/Configuration.html
+++ b/Configuration.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Configuring Apache Atlas - Application Properties</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>
@@ -243,7 +243,8 @@ zookeeper.znode.parent=/hbase-unsecure
    kinit -k -t &lt;hbase keytab&gt; &lt;hbase principal&gt;
    echo &quot;grant 'atlas', 'RWXCA', 'titan'&quot; | hbase shell
 
-</pre></div></div>
+</pre></div>
+<p>Note that HBase is included in the distribution so that a standalone instance of HBase can be started as the default storage backend for the graph repository.</p></div>
 <div class="section">
 <h4><a name="Graph_Search_Index"></a>Graph Search Index</h4>
 <p>This section sets up the graph db - titan - to use an search indexing system. The example configuration below sets up to use an embedded Elastic search indexing system.</p>
@@ -274,10 +275,10 @@ atlas.graph.index.search.elasticsearch.create.sleep=2000
 <p>Refer <a class="externalLink" href="http://s3.thinkaurelius.com/docs/titan/0.5.4/bdb.html">http://s3.thinkaurelius.com/docs/titan/0.5.4/bdb.html</a> and <a class="externalLink" href="http://s3.thinkaurelius.com/docs/titan/0.5.4/hbase.html">http://s3.thinkaurelius.com/docs/titan/0.5.4/hbase.html</a> for choosing between the persistence backends. BerkeleyDB is suitable for smaller data sets in the range of upto 10 million vertices with ACID gurantees. HBase on the other hand doesnt provide ACID guarantees but is able to scale for larger graphs. HBase also provides HA inherently.</p></div>
 <div class="section">
 <h4><a name="Choosing_between_Indexing_Backends"></a>Choosing between Indexing Backends</h4>
-<p>Refer <a class="externalLink" href="http://s3.thinkaurelius.com/docs/titan/0.5.4/elasticsearch.html">http://s3.thinkaurelius.com/docs/titan/0.5.4/elasticsearch.html</a> and <a class="externalLink" href="http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.html">http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.html</a> for chossing between <a href="./ElasticSarch.html">ElasticSarch</a> and Solr. Solr in cloud mode is the recommended setup.</p></div>
+<p>Refer <a class="externalLink" href="http://s3.thinkaurelius.com/docs/titan/0.5.4/elasticsearch.html">http://s3.thinkaurelius.com/docs/titan/0.5.4/elasticsearch.html</a> and <a class="externalLink" href="http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.html">http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.html</a> for choosing between ElasticSearch and Solr. Solr in cloud mode is the recommended setup.</p></div>
 <div class="section">
 <h4><a name="Switching_Persistence_Backend"></a>Switching Persistence Backend</h4>
-<p>For switching the storage backend from BerkeleyDB to HBase and vice versa, refer the documentation for &quot;Graph Persistence Engine&quot; described above and restart ATLAS. The data in the indexing backend needs to be cleared else there will be discrepancies between the storage and indexing backend which could result in errors during the search. <a href="./ElasticSearch.html">ElasticSearch</a> runs by default in embedded mode and the data could easily be cleared by deleting the ATLAS_HOME/data/es directory. For Solr, the collections which were created during ATLAS Installation - vertex_index, edge_index, fulltext_index could be deleted which will cleanup the indexes</p></div>
+<p>For switching the storage backend from BerkeleyDB to HBase and vice versa, refer the documentation for &quot;Graph Persistence Engine&quot; described above and restart ATLAS. The data in the indexing backend needs to be cleared else there will be discrepancies between the storage and indexing backend which could result in errors during the search. ElasticSearch runs by default in embedded mode and the data could easily be cleared by deleting the ATLAS_HOME/data/es directory. For Solr, the collections which were created during ATLAS Installation - vertex_index, edge_index, fulltext_index could be deleted which will cleanup the indexes</p></div>
 <div class="section">
 <h4><a name="Switching_Index_Backend"></a>Switching Index Backend</h4>
 <p>Switching the Index backend requires clearing the persistence backend data. Otherwise there will be discrepancies between the persistence and index backends since switching the indexing backend means index data will be lost. This leads to &quot;Fulltext&quot; queries not working on the existing data For clearing the data for BerkeleyDB, delete the ATLAS_HOME/data/berkeley directory For clearing the data for HBase, in Hbase shell, run 'disable titan' and 'drop titan'</p></div>
@@ -336,6 +337,56 @@ atlas.rest.address=&lt;http/https&gt;://&lt;atlas-fqdn&gt;:&lt;atlas port&gt; -
 atlas.enableTLS=false
 
 </pre></div></div>
+<div class="section">
+<h3><a name="High_Availability_Properties"></a>High Availability Properties</h3>
+<p>The following properties describe High Availability related configuration options:</p>
+<div class="source">
+<pre>
+# Set the following property to true, to enable High Availability. Default = false.
+atlas.server.ha.enabled=true
+
+# Define a unique set of strings to identify each instance that should run an Atlas Web Service instance as a comma separated list.
+atlas.server.ids=id1,id2
+# For each string defined above, define the host and port on which Atlas server binds to.
+atlas.server.address.id1=host1.company.com:21000
+atlas.server.address.id2=host2.company.com:31000
+
+# Specify Zookeeper properties needed for HA.
+# Specify the list of services running Zookeeper servers as a comma separated list.
+atlas.server.ha.zookeeper.connect=zk1.company.com:2181,zk2.company.com:2181,zk3.company.com:2181
+# Specify how many times should connection try to be established with a Zookeeper cluster, in case of any connection issues.
+atlas.server.ha.zookeeper.num.retries=3
+# Specify how much time should the server wait before attempting connections to Zookeeper, in case of any connection issues.
+atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
+# Specify how long a session to Zookeeper should last without inactiviy to be deemed as unreachable.
+atlas.server.ha.zookeeper.session.timeout.ms=20000
+
+# Specify the scheme and the identity to be used for setting up ACLs on nodes created in Zookeeper for HA.
+# The format of these options is &lt;scheme&gt;:&lt;identity&gt;. For more information refer to http://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl.
+# The 'acl' option allows to specify a scheme, identity pair to setup an ACL for.
+atlas.server.ha.zookeeper.acl=auth:sasl:client@comany.com
+# The 'auth' option specifies the authentication that should be used for connecting to Zookeeper.
+atlas.server.ha.zookeeper.auth=sasl:client@company.com
+
+# Since Zookeeper is a shared service that is typically used by many components,
+# it is preferable for each component to set its znodes under a namespace.
+# Specify the namespace under which the znodes should be written. Default = /apache_atlas
+atlas.server.ha.zookeeper.zkroot=/apache_atlas
+
+# Specify number of times a client should retry with an instance before selecting another active instance, or failing an operation.
+atlas.client.ha.retries=4
+# Specify interval between retries for a client.
+atlas.client.ha.sleep.interval.ms=5000
+
+</pre></div></div>
+<div class="section">
+<h3><a name="Server_Properties"></a>Server Properties</h3>
+<div class="source">
+<pre>
+# Set the following property to true, to enable the setup steps to run on each server start. Default = false.
+atlas.server.run.setup.on.start=false
+
+</pre></div></div>
                   </div>
           </div>
 

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/HighAvailability.html
----------------------------------------------------------------------
diff --git a/HighAvailability.html b/HighAvailability.html
index 56cb376..f91e088 100644
--- a/HighAvailability.html
+++ b/HighAvailability.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Fault Tolerance and High Availability Options</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>
@@ -203,32 +203,121 @@
 <h2><a name="Fault_Tolerance_and_High_Availability_Options"></a>Fault Tolerance and High Availability Options</h2></div>
 <div class="section">
 <h3><a name="Introduction"></a>Introduction</h3>
-<p>Apache Atlas uses and interacts with a variety of systems to provide metadata management and data lineage to data administrators. By choosing and configuring these dependencies appropriately, it is possible to achieve a good degree of service availability with Atlas. This document describes the state of high availability support in Atlas, including its capabilities and current limitations, and also the configuration required for achieving a this level of high availability.</p>
+<p>Apache Atlas uses and interacts with a variety of systems to provide metadata management and data lineage to data administrators. By choosing and configuring these dependencies appropriately, it is possible to achieve a high degree of service availability with Atlas. This document describes the state of high availability support in Atlas, including its capabilities and current limitations, and also the configuration required for achieving this level of high availability.</p>
 <p><a href="./Architecture.html">The architecture page</a> in the wiki gives an overview of the various components that make up Atlas. The options mentioned below for various components derive context from the above page, and would be worthwhile to review before proceeding to read this page.</p></div>
 <div class="section">
 <h3><a name="Atlas_Web_Service"></a>Atlas Web Service</h3>
-<p>Currently, the Atlas Web service has a limitation that it can only have one active instance at a time. Therefore, in case of errors to the host running the service, a new Atlas web service instance should be brought up and pointed to from the clients. In future versions of the system, we plan to provide full High Availability of the service, thereby enabling hot failover. To minimize service loss, we recommend the following:</p>
+<p>Currently, the Atlas Web Service has a limitation that it can only have one active instance at a time. In earlier releases of Atlas, a backup instance could be provisioned and kept available. However, a manual failover was required to make this backup instance active.</p>
+<p>From this release, Atlas will support multiple instances of the Atlas Web service in an active/passive configuration with automated failover. This means that users can deploy and start multiple instances of the Atlas Web Service on different physical hosts at the same time. One of these instances will be automatically selected as an 'active' instance to service user requests. The others will automatically be deemed 'passive'. If the 'active' instance becomes unavailable either because it is deliberately stopped, or due to unexpected failures, one of the other instances will automatically be elected as an 'active' instance and start to service user requests.</p>
+<p>An 'active' instance is the only instance that can respond to user requests correctly. It can create, delete, modify or respond to queries on metadata objects. A 'passive' instance will accept user requests, but will redirect them using HTTP redirect to the currently known 'active' instance. Specifically, a passive instance will not itself respond to any queries on metadata objects. However, all instances (both active and passive), will respond to admin requests that return information about that instance.</p>
+<p>When configured in a High Availability mode, users can get the following operational benefits:</p>
 <p></p>
 <ul>
-<li>An extra physical host with the Atlas system software and configuration is available to be brought up on demand.</li>
-<li>It would be convenient to have the web service fronted by a proxy solution like <a class="externalLink" href="https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#5.2">HAProxy</a> which can be used to provide both the monitoring and transparent switching of the backend instance clients talk to.
+<li><b>Uninterrupted service during maintenance intervals</b>: If an active instance of the Atlas Web Service needs to be brought down for maintenance, another instance would automatically become active and can service requests.</li>
+<li><b>Uninterrupted service in event of unexpected failures</b>: If an active instance of the Atlas Web Service fails due to software or hardware errors, another instance would automatically become active and can service requests.</li></ul>
+<p>In the following sub-sections, we describe the steps required to setup High Availability for the Atlas Web Service. We also describe how the deployment and client can be designed to take advantage of this capability. Finally, we describe a few details of the underlying implementation.</p></div>
+<div class="section">
+<h4><a name="Setting_up_the_High_Availability_feature_in_Atlas"></a>Setting up the High Availability feature in Atlas</h4>
+<p>The following pre-requisites must be met for setting up the High Availability feature.</p>
+<p></p>
+<ul>
+<li>Ensure that you install Apache Zookeeper on a cluster of machines (a minimum of 3 servers is recommended for production).</li>
+<li>Select 2 or more physical machines to run the Atlas Web Service instances on. These machines define what we refer to as a 'server ensemble' for Atlas.</li></ul>
+<p>To setup High Availability in Atlas, a few configuration options must be defined in the <tt>atlas-application.properties</tt> file. While the complete list of configuration items are defined in the <a href="./Configuration.html">Configuration Page</a>, this section lists a few of the main options.</p>
+<p></p>
+<ul>
+<li>High Availability is an optional feature in Atlas. Hence, it must be enabled by setting the configuration option <tt>atlas.server.ha.enabled</tt> to true.</li>
+<li>Next, define a list of identifiers, one for each physical machine you have selected for the Atlas Web Service instance. These identifiers can be simple strings like <tt>id1</tt>, <tt>id2</tt> etc. They should be unique and should not contain a comma.</li>
+<li>Define a comma separated list of these identifiers as the value of the option <tt>atlas.server.ids</tt>.</li>
+<li>For each physical machine, list the IP Address/hostname and port as the value of the configuration <tt>atlas.server.address.id</tt>, where <tt>id</tt> refers to the identifier string for this physical machine.
+<ul>
+<li>For e.g., if you have selected 2 machines with hostnames <tt>host1.company.com</tt> and <tt>host2.company.com</tt>, you can define the configuration options as below:</li></ul></li></ul>
+<div class="source">
+<pre>
+      atlas.server.ids=id1,id2
+      atlas.server.address.id1=host1.company.com:21000
+      atlas.server.address.id2=host2.company.com:21000
+      
+</pre></div>
+<p></p>
 <ul>
-<li>An example HAProxy configuration of this form will allow a transparent failover to a backup server:</li></ul></li></ul>
+<li>Define the Zookeeper quorum which will be used by the Atlas High Availability feature.</li></ul>
 <div class="source">
 <pre>
-      listen atlas
-        bind &lt;proxy hostname&gt;:&lt;proxy port&gt;
-        balance roundrobin
-        server inst1 &lt;atlas server hostname&gt;:&lt;port&gt; check
-        server inst2 &lt;atlas backup server hostname&gt;:&lt;port&gt; check backup
+      atlas.server.ha.zookeeper.connect=zk1.company.com:2181,zk2.company.com:2181,zk3.company.com:2181
       
 </pre></div>
 <p></p>
 <ul>
-<li>The stores that hold Atlas data can be configured to be highly available as described below.</li></ul></div>
+<li>You can review other configuration options that are defined for the High Availability feature, and set them up as desired in the <tt>atlas-application.properties</tt> file.</li>
+<li>For production environments, the components that Atlas depends on must also be set up in High Availability mode. This is described in detail in the following sections. Follow those instructions to setup and configure them.</li>
+<li>Install the Atlas software on the selected physical machines.</li>
+<li>Copy the <tt>atlas-application.properties</tt> file created using the steps above to the configuration directory of all the machines.</li>
+<li>Start the dependent components.</li>
+<li>Start each instance of the Atlas Web Service.</li></ul>
+<p>To verify that High Availability is working, run the following script on each of the instances where Atlas Web Service is installed.</p>
+<div class="source">
+<pre>
+$ATLAS_HOME/bin/atlas_admin.py -status
+
+</pre></div>
+<p>This script can print one of the values below as response:</p>
+<p></p>
+<ul>
+<li><b>ACTIVE</b>: This instance is active and can respond to user requests.</li>
+<li><b>PASSIVE</b>: This instance is PASSIVE. It will redirect any user requests it receives to the current active instance.</li>
+<li><b>BECOMING_ACTIVE</b>: This would be printed if the server is transitioning to become an ACTIVE instance. The server cannot service any metadata user requests in this state.</li>
+<li><b>BECOMING_PASSIVE</b>: This would be printed if the server is transitioning to become a PASSIVE instance. The server cannot service any metadata user requests in this state.</li></ul>
+<p>Under normal operating circumstances, only one of these instances should print the value <b>ACTIVE</b> as response to the script, and the others would print <b>PASSIVE</b>.</p></div>
+<div class="section">
+<h4><a name="Configuring_clients_to_use_the_High_Availability_feature"></a>Configuring clients to use the High Availability feature</h4>
+<p>The Atlas Web Service can be accessed in two ways:</p>
+<p></p>
+<ul>
+<li><b>Using the Atlas Web UI</b>: This is a browser based client that can be used to query the metadata stored in Atlas.</li>
+<li><b>Using the Atlas REST API</b>: As Atlas exposes a RESTful API, one can use any standard REST client including libraries in other applications. In fact, Atlas ships with a client called AtlasClient that can be used as an example to build REST client access.</li></ul>
+<p>In order to take advantage of the High Availability feature in the clients, there are two options possible.</p></div>
+<div class="section">
+<h5><a name="Using_an_intermediate_proxy"></a>Using an intermediate proxy</h5>
+<p>The simplest solution to enable highly available access to Atlas is to install and configure some intermediate proxy that has a capability to transparently switch services based on status. One such proxy solution is <a class="externalLink" href="http://www.haproxy.org/">HAProxy</a>.</p>
+<p>Here is an example HAProxy configuration that can be used. Note this is provided for illustration only, and not as a recommended production configuration. For that, please refer to the HAProxy documentation for appropriate instructions.</p>
+<div class="source">
+<pre>
+frontend atlas_fe
+  bind *:41000
+  default_backend atlas_be
+
+backend atlas_be
+  mode http
+  option httpchk get /api/atlas/admin/status
+  http-check expect string ACTIVE
+  balance roundrobin
+  server host1_21000 host1:21000 check
+  server host2_21000 host2:21000 check backup
+
+listen atlas
+  bind localhost:42000
+
+</pre></div>
+<p>The above configuration binds HAProxy to listen on port 41000 for incoming client connections. It then routes the connections to either of the hosts host1 or host2 depending on a HTTP status check. The status check is done using a HTTP GET on the REST URL <tt>/api/atlas/admin/status</tt>, and is deemed successful only if the HTTP response contains the string ACTIVE.</p></div>
+<div class="section">
+<h5><a name="Using_automatic_detection_of_active_instance"></a>Using automatic detection of active instance</h5>
+<p>If one does not want to setup and manage a separate proxy, then the other option to use the High Availability feature is to build a client application that is capable of detecting status and retrying operations. In such a setting, the client application can be launched with the URLs of all Atlas Web Service instances that form the ensemble. The client should then call the REST URL <tt>/api/atlas/admin/status</tt> on each of these to determine which is the active instance. The response from the Active instance would be of the form <tt>{Status:ACTIVE}</tt>. Also, when the client faces any exceptions in the course of an operation, it should again determine which of the remaining URLs is active and retry the operation.</p>
+<p>The AtlasClient class that ships with Atlas can be used as an example client library that implements the logic for working with an ensemble and selecting the right Active server instance.</p>
+<p>Utilities in Atlas, like <tt>quick_start.py</tt> and <tt>import-hive.sh</tt> can be configured to run with multiple server URLs. When launched in this mode, the AtlasClient automatically selects and works with the current active instance. If a proxy is set up in between, then its address can be used when running quick_start.py or import-hive.sh.</p></div>
+<div class="section">
+<h4><a name="Implementation_Details_of_Atlas_High_Availability"></a>Implementation Details of Atlas High Availability</h4>
+<p>The Atlas High Availability work is tracked under the master JIRA <a class="externalLink" href="https://issues.apache.org/jira/browse/ATLAS-510">ATLAS-510</a>. The JIRAs filed under it have detailed information about how the High Availability feature has been implemented. At a high level the following points can be called out:</p>
+<p></p>
+<ul>
+<li>The automatic selection of an Active instance, as well as automatic failover to a new Active instance happen through a leader election algorithm.</li>
+<li>For leader election, we use the <a class="externalLink" href="http://curator.apache.org/curator-recipes/leader-latch.html">Leader Latch Recipe</a> of <a class="externalLink" href="http://curator.apache.org">Apache Curator</a>.</li>
+<li>The Active instance is the only one which initializes, modifies or reads state in the backend stores to keep them consistent.</li>
+<li>Also, when an instance is elected as Active, it refreshes any cached information from the backend stores to get up to date.</li>
+<li>A servlet filter ensures that only the active instance services user requests. If a passive instance receives these requests, it automatically redirects them to the current active instance.</li></ul></div>
 <div class="section">
 <h3><a name="Metadata_Store"></a>Metadata Store</h3>
-<p>As described above, Atlas uses Titan to store the metadata it manages. By default, Titan uses BerkeleyDB as an embedded backing store. However, this option would result in loss of data if the node running the Atlas server fails. In order to provide HA for the metadata store, we recommend that Atlas be configured to use HBase as the backing store for Titan. Doing this implies that you could benefit from the HA guarantees HBase provides. In order to configure Atlas to use HBase in HA mode, do the following:</p>
+<p>As described above, Atlas uses Titan to store the metadata it manages. By default, Atlas uses a standalone HBase instance as the backing store for Titan. In order to provide HA for the metadata store, we recommend that Atlas be configured to use distributed HBase as the backing store for Titan.  Doing this implies that you could benefit from the HA guarantees HBase provides. In order to configure Atlas to use HBase in HA mode, do the following:</p>
 <p></p>
 <ul>
 <li>Choose an existing HBase cluster that is set up in HA mode to configure in Atlas (OR) Set up a new HBase cluster in <a class="externalLink" href="http://hbase.apache.org/book.html#quickstart_fully_distributed">HA mode</a>.
@@ -283,7 +372,6 @@
 <h3><a name="Known_Issues"></a>Known Issues</h3>
 <p></p>
 <ul>
-<li><a class="externalLink" href="https://issues.apache.org/jira/browse/ATLAS-338">ATLAS-338</a>: ATLAS-338: Metadata events generated from a Hive CLI (as opposed to Beeline or any client going <a href="./HiveServer.html">HiveServer</a>2) would be lost if Atlas server is down.</li>
 <li>If the HBase region servers hosting the Atlas &#x2018;titan&#x2019; HTable are down, Atlas would not be able to store or retrieve metadata from HBase until they are brought back online.</li></ul></div>
                   </div>
           </div>

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/InstallationSteps.html
----------------------------------------------------------------------
diff --git a/InstallationSteps.html b/InstallationSteps.html
index 687708c..45c05e1 100644
--- a/InstallationSteps.html
+++ b/InstallationSteps.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Building & Installing Apache Atlas</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>
@@ -209,7 +209,7 @@ git clone https://git-wip-us.apache.org/repos/asf/incubator-atlas.git atlas
 
 cd atlas
 
-export MAVEN_OPTS=&quot;-Xmx1024m -XX:MaxPermSize=256m&quot; &amp;&amp; mvn clean install
+export MAVEN_OPTS=&quot;-Xmx1024m -XX:MaxPermSize=512m&quot; &amp;&amp; mvn clean install
 
 </pre></div>
 <p>Once the build successfully completes, artifacts can be packaged for deployment.</p>
@@ -233,8 +233,9 @@ mvn clean package -Pdist
    |- cputil.py
 |- conf
    |- atlas-application.properties
-   |- client.properties
    |- atlas-env.sh
+   |- hbase
+      |- hbase-site.xml.template
    |- log4j.xml
    |- solr
       |- currency.xml
@@ -246,6 +247,10 @@ mvn clean package -Pdist
       |- stopwords.txt
       |- synonyms.txt
 |- docs
+|- hbase
+   |- bin
+   |- conf
+   ...
 |- server
    |- webapp
       |- atlas.war
@@ -256,18 +261,21 @@ mvn clean package -Pdist
 |- CHANGES.txt
 
 
-</pre></div></div>
+</pre></div>
+<p>Note that HBase is included in the distribution so that a standalone instance of HBase can be started as the default storage backend for the graph repository.  During Atlas installation the conf/hbase/hbase-site.xml.template gets expanded and moved to hbase/conf/hbase-site.xml for the initial standalone HBase configuration.  To configure ATLAS graph persistence for a different HBase instance, please see &quot;Graph persistence engine - HBase&quot; in the <a href="./Configuration.html">Configuration</a> section.</p></div>
 <div class="section">
-<h4><a name="Installing__Running_Atlas"></a>Installing &amp; Running Atlas</h4>
-<p><b>Installing Atlas</b></p>
+<h4><a name="Installing__Running_Atlas"></a>Installing &amp; Running Atlas</h4></div>
+<div class="section">
+<h5><a name="Installing_Atlas"></a>Installing Atlas</h5>
 <div class="source">
 <pre>
 tar -xzvf apache-atlas-${project.version}-bin.tar.gz
 
 cd atlas-${project.version}
 
-</pre></div>
-<p><b>Configuring Atlas</b></p>
+</pre></div></div>
+<div class="section">
+<h5><a name="Configuring_Atlas"></a>Configuring Atlas</h5>
 <p>By default config directory used by Atlas is {package dir}/conf. To override this set environment variable ATLAS_CONF to the path of the conf dir.</p>
 <p>atlas-env.sh has been added to the Atlas conf. This file can be used to set various environment variables that you need for you services. In addition you can set any other environment variables you might need. This file will be sourced by atlas scripts before any commands are executed. The following environment variables are available to set.</p>
 <div class="source">
@@ -306,6 +314,27 @@ cd atlas-${project.version}
 #export ATLAS_EXPANDED_WEBAPP_DIR=
 
 </pre></div>
+<p><b>Settings to support large number of metadata objects</b></p>
+<p>If you plan to store several tens of thousands of metadata objects, it is recommended that you use values tuned for better GC performance of the JVM.</p>
+<p>The following values are common server side options:</p>
+<div class="source">
+<pre>
+export ATLAS_SERVER_OPTS=&quot;-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps&quot;
+
+</pre></div>
+<p>The <tt>-XX:SoftRefLRUPolicyMSPerMB</tt> option was found to be particularly helpful to regulate GC performance for query heavy workloads with many concurrent users.</p>
+<p>The following values are recommended for JDK 7:</p>
+<div class="source">
+<pre>
+export ATLAS_SERVER_HEAP=&quot;-Xms15360m -Xmx15360m -XX:MaxNewSize=3072m -XX:PermSize=100M -XX:MaxPermSize=512m&quot;
+
+</pre></div>
+<p>The following values are recommended for JDK 8:</p>
+<div class="source">
+<pre>
+export ATLAS_SERVER_HEAP=&quot;-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m&quot;
+
+</pre></div>
 <p><b>NOTE for Mac OS users</b> If you are using a Mac OS, you will need to configure the ATLAS_SERVER_OPTS (explained above).</p>
 <p>In  {package dir}/conf/atlas-env.sh uncomment the following line</p>
 <div class="source">
@@ -323,8 +352,8 @@ export ATLAS_SERVER_OPTS=&quot;-Djava.awt.headless=true -Djava.security.krb5.rea
 <p>By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently. The HBase versions currently supported are 1.1.x. For configuring ATLAS graph persistence on HBase, please see &quot;Graph persistence engine - HBase&quot; in the <a href="./Configuration.html">Configuration</a> section for more details.</p>
 <p>Pre-requisites for running HBase as a distributed cluster</p>
 <ul>
-<li>3 or 5 <a href="./ZooKeeper.html">ZooKeeper</a> nodes</li>
-<li>Atleast 3 <a href="./RegionServer.html">RegionServer</a> nodes. It would be ideal to run the <a href="./DataNodes.html">DataNodes</a> on the same hosts as the Region servers for data locality.</li></ul>
+<li>3 or 5 ZooKeeper nodes</li>
+<li>Atleast 3 RegionServer nodes. It would be ideal to run the DataNodes on the same hosts as the Region servers for data locality.</li></ul>
 <p><b>Configuring SOLR as the Indexing Backend for the Graph Repository</b></p>
 <p>By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently. For configuring Titan to work with Solr, please follow the instructions below</p>
 <p></p>
@@ -332,7 +361,7 @@ export ATLAS_SERVER_OPTS=&quot;-Djava.awt.headless=true -Djava.security.krb5.rea
 <li>Install solr if not already running. The version of SOLR supported is 5.2.1. Could be installed from <a class="externalLink" href="http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz">http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz</a></li></ul>
 <p></p>
 <ul>
-<li>Start solr in cloud mode.</li></ul><a href="./SolrCloud.html">SolrCloud</a> mode uses a <a href="./ZooKeeper.html">ZooKeeper</a> Service as a highly available, central location for cluster management.   For a small cluster, running with an existing <a href="./ZooKeeper.html">ZooKeeper</a> quorum should be fine. For larger clusters, you would want to run separate multiple <a href="./ZooKeeper.html">ZooKeeper</a> quorum with atleast 3 servers.   Note: Atlas currently supports solr in &quot;cloud&quot; mode only. &quot;http&quot; mode is not supported. For more information, refer solr documentation - <a class="externalLink" href="https://cwiki.apache.org/confluence/display/solr/SolrCloud">https://cwiki.apache.org/confluence/display/solr/SolrCloud</a>
+<li>Start solr in cloud mode.</li></ul>SolrCloud mode uses a ZooKeeper Service as a highly available, central location for cluster management.   For a small cluster, running with an existing ZooKeeper quorum should be fine. For larger clusters, you would want to run separate multiple ZooKeeper quorum with atleast 3 servers.   Note: Atlas currently supports solr in &quot;cloud&quot; mode only. &quot;http&quot; mode is not supported. For more information, refer solr documentation - <a class="externalLink" href="https://cwiki.apache.org/confluence/display/solr/SolrCloud">https://cwiki.apache.org/confluence/display/solr/SolrCloud</a>
 <p></p>
 <ul>
 <li>For e.g., to bring up a Solr node listening on port 8983 on a machine, you can use the command:</li></ul>
@@ -351,7 +380,7 @@ export ATLAS_SERVER_OPTS=&quot;-Djava.awt.headless=true -Djava.security.krb5.rea
   bin/solr create -c fulltext_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
 
 </pre></div>
-<p>Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance.   Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.   The number of shards cannot exceed the total number of Solr nodes in your SolrCloud cluster.</p>
+<p>Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance.   Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.   The number of shards cannot exceed the total number of Solr nodes in your !SolrCloud cluster.</p>
 <p>The number of replicas (replicationFactor) can be set according to the redundancy required.</p>
 <p></p>
 <ul>
@@ -367,8 +396,15 @@ export ATLAS_SERVER_OPTS=&quot;-Djava.awt.headless=true -Djava.security.krb5.rea
 <ul>
 <li>Restart Atlas</li></ul>
 <p>For more information on Titan solr configuration , please refer <a class="externalLink" href="http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.htm">http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.htm</a></p>
-<p>Pre-requisites for running Solr in cloud mode   * Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk.     Solr works well with 32GB RAM. Plan to provide as much memory as possible to Solr process   * Disk - If the number of entities that need to be stored are large, plan to have at least 500 GB free space in the volume where Solr is going to store the index data   * <a href="./SolrCloud.html">SolrCloud</a> has support for replication and sharding. It is highly recommended to use <a href="./SolrCloud.html">SolrCloud</a> with at least two Solr nodes running on different servers with replication enabled.     If using <a href="./SolrCloud.html">SolrCloud</a>, then you also need <a href="./ZooKeeper.html">ZooKeeper</a> installed and configured with 3 or 5 <a href="./ZooKeeper.html">ZooKeeper</a> nodes</p>
-<p><b>Starting Atlas Server</b></p>
+<p>Pre-requisites for running Solr in cloud mode   * Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk.     Solr works well with 32GB RAM. Plan to provide as much memory as possible to Solr process   * Disk - If the number of entities that need to be stored are large, plan to have at least 500 GB free space in the volume where Solr is going to store the index data   * SolrCloud has support for replication and sharding. It is highly recommended to use SolrCloud with at least two Solr nodes running on different servers with replication enabled.     If using SolrCloud, then you also need ZooKeeper installed and configured with 3 or 5 ZooKeeper nodes</p></div>
+<div class="section">
+<h5><a name="Setting_up_Atlas"></a>Setting up Atlas</h5>
+<p>There are a few steps that setup dependencies of Atlas. One such example is setting up the Titan schema in the storage backend of choice. In a simple single server setup, these are automatically setup with default configuration when the server first accesses these dependencies.</p>
+<p>However, there are scenarios when we may want to run setup steps explicitly as one time operations. For example, in a multiple server scenario using <a href="./HighAvailability.html">High Availability</a>, it is preferable to run setup steps from one of the server instances the first time, and then start the services.</p>
+<p>To run these steps one time, execute the command <tt>bin/atlas_start.py -setup</tt> from a single Atlas server instance.</p>
+<p>However, the Atlas server does take care of parallel executions of the setup steps. Also, running the setup steps multiple times is idempotent. Therefore, if one chooses to run the setup steps as part of server startup, for convenience, then they should enable the configuration option <tt>atlas.server.run.setup.on.start</tt> by defining it with the value <tt>true</tt> in the <tt>atlas-application.properties</tt> file.</p></div>
+<div class="section">
+<h5><a name="Starting_Atlas_Server"></a>Starting Atlas Server</h5>
 <div class="source">
 <pre>
 bin/atlas_start.py [-port &lt;port&gt;]
@@ -377,8 +413,10 @@ bin/atlas_start.py [-port &lt;port&gt;]
 <p>By default,</p>
 <ul>
 <li>To change the port, use -port option.</li>
-<li>atlas server starts with conf from {package dir}/conf. To override this (to use the same conf with multiple atlas upgrades), set environment variable ATLAS_CONF to the path of conf dir</li></ul>
-<p><b>Using Atlas</b></p>
+<li>atlas server starts with conf from {package dir}/conf. To override this (to use the same conf with multiple atlas upgrades), set environment variable ATLAS_CONF to the path of conf dir</li></ul></div>
+<div class="section">
+<h4><a name="Using_Atlas"></a>Using Atlas</h4>
+<p></p>
 <ul>
 <li>Quick start model - sample model and data</li></ul>
 <div class="source">
@@ -424,13 +462,20 @@ bin/atlas_start.py [-port &lt;port&gt;]
 
 </pre></div>
 <p><b>Dashboard</b></p>
-<p>Once atlas is started, you can view the status of atlas entities using the Web-based dashboard. You can open your browser at the corresponding port to use the web UI.</p>
-<p><b>Stopping Atlas Server</b></p>
+<p>Once atlas is started, you can view the status of atlas entities using the Web-based dashboard. You can open your browser at the corresponding port to use the web UI.</p></div>
+<div class="section">
+<h4><a name="Stopping_Atlas_Server"></a>Stopping Atlas Server</h4>
 <div class="source">
 <pre>
 bin/atlas_stop.py
 
 </pre></div></div>
+<div class="section">
+<h4><a name="Known_Issues"></a>Known Issues</h4></div>
+<div class="section">
+<h5><a name="Setup"></a>Setup</h5>
+<p>If the setup of Atlas service fails due to any reason, the next run of setup (either by an explicit invocation of <tt>atlas_start.py -setup</tt> or by enabling the configuration option <tt>atlas.server.run.setup.on.start</tt>) will fail with a message such as <tt>A previous setup run may not have completed cleanly.</tt>. In such cases, you would need to manually ensure the setup can run and delete the Zookeeper node at <tt>/apache_atlas/setup_in_progress</tt> before attempting to run setup again.</p>
+<p>If the setup failed due to HBase Titan schema setup errors, it may be necessary to repair the HBase schema. If no data has been stored, one can also disable and drop the 'titan' schema in HBase to let setup run again.</p></div>
                   </div>
           </div>
 

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/Notification-Entity.html
----------------------------------------------------------------------
diff --git a/Notification-Entity.html b/Notification-Entity.html
index 184fd90..0471908 100644
--- a/Notification-Entity.html
+++ b/Notification-Entity.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Entity Change Notifications</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/QuickStart.html
----------------------------------------------------------------------
diff --git a/QuickStart.html b/QuickStart.html
index 2bb35b0..8f32fe4 100644
--- a/QuickStart.html
+++ b/QuickStart.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Quick Start Guide</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/Repository.html
----------------------------------------------------------------------
diff --git a/Repository.html b/Repository.html
index ef67868..ed0f43d 100644
--- a/Repository.html
+++ b/Repository.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Repository</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/Search.html
----------------------------------------------------------------------
diff --git a/Search.html b/Search.html
index 8f89001..ac79c42 100644
--- a/Search.html
+++ b/Search.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Search</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>
@@ -264,14 +264,14 @@ literal: booleanConstant |
 <p>Grammar language: {noformat} opt(a)     =&gt; a is optional ~            =&gt; a combinator. 'a ~ b' means a followed by b rep         =&gt; zero or more rep1sep =&gt; one or more, separated by second arg. {noformat}</p>
 <p>Language Notes:</p>
 <ul>
-<li>A <b><a href="./SingleQuery.html">SingleQuery</a></b> expression can be used to search for entities of a <i>Trait</i> or <i>Class</i>.</li></ul>Entities can be filtered based on a 'Where Clause' and Entity Attributes can be retrieved based on a 'Select Clause'.
+<li>A <b>SingleQuery</b> expression can be used to search for entities of a <i>Trait</i> or <i>Class</i>.</li></ul>Entities can be filtered based on a 'Where Clause' and Entity Attributes can be retrieved based on a 'Select Clause'.
 <ul>
-<li>An Entity Graph can be traversed/joined by combining one or more <a href="./SingleQueries.html">SingleQueries</a>.</li>
+<li>An Entity Graph can be traversed/joined by combining one or more SingleQueries.</li>
 <li>An attempt is made to make the expressions look SQL like by accepting keywords &quot;SELECT&quot;,</li></ul>&quot;FROM&quot;, and &quot;WHERE&quot;; but these are optional and users can simply think in terms of Entity Graph Traversals.
 <ul>
 <li>The transitive closure of an Entity relationship can be expressed via the <i>Loop</i> expression. A</li></ul><i>Loop</i> expression can be any traversal (recursively a query) that represents a <i>Path</i> that ends in an Entity of the same <i>Type</i> as the starting Entity.
 <ul>
-<li>The <i><a href="./WithPath.html">WithPath</a></i> clause can be used with transitive closure queries to retrieve the Path that</li></ul>connects the two related Entities. (We also provide a higher level interface for Closure Queries   see scaladoc for 'org.apache.atlas.query.ClosureQuery')
+<li>The <i>WithPath</i> clause can be used with transitive closure queries to retrieve the Path that</li></ul>connects the two related Entities. (We also provide a higher level interface for Closure Queries   see scaladoc for 'org.apache.atlas.query.ClosureQuery')
 <ul>
 <li>There are couple of Predicate functions different from SQL:
 <ul>
@@ -285,7 +285,7 @@ literal: booleanConstant |
 <li>from DB</li>
 <li>DB where name=&quot;Reporting&quot; select name, owner</li>
 <li>DB has name</li>
-<li>DB is <a href="./JdbcAccess.html">JdbcAccess</a></li>
+<li>DB is JdbcAccess</li>
 <li>Column where Column isa PII</li>
 <li>Table where name=&quot;sales_fact&quot;, columns</li>
 <li>Table where name=&quot;sales_fact&quot;, columns as column select column.name, column.dataType, column.comment</li>

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/Security.html
----------------------------------------------------------------------
diff --git a/Security.html b/Security.html
index 8af726c..9649d81 100644
--- a/Security.html
+++ b/Security.html
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2016-01-05
+ | Generated by Apache Maven Doxia at 2016-04-25
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20160105" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache Atlas &#x2013; Security Features of Apache Atlas</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -189,7 +189,7 @@
         
                 
                     
-                  <li id="publishDate" class="pull-right">Last Published: 2016-01-05</li> <li class="divider pull-right">|</li>
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
               <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
             
                             </ul>
@@ -217,7 +217,8 @@
 <li><code>keystore.file</code> - the path to the keystore file leveraged by the server.  This file contains the server certificate.</li>
 <li><code>truststore.file</code> - the path to the truststore file. This file contains the certificates of other trusted entities (e.g. the certificates for client processes if two-way SSL is enabled).  In most instances this can be set to the same value as the keystore.file property (especially if one-way SSL is enabled).</li>
 <li><code>client.auth.enabled</code> (false|true) [default: false] - enable/disable client authentication.  If enabled, the client will have to authenticate to the server during the transport session key creation process (i.e. two-way SSL is in effect).</li>
-<li><code>cert.stores.credential.provider.path</code> - the path to the Credential Provider store file.  The passwords for the keystore, truststore, and server certificate are maintained in this secure file.  Utilize the cputil script in the 'bin' directoy (see below) to populate this file with the passwords required.</li></ul></div>
+<li><code>cert.stores.credential.provider.path</code> - the path to the Credential Provider store file.  The passwords for the keystore, truststore, and server certificate are maintained in this secure file.  Utilize the cputil script in the 'bin' directoy (see below) to populate this file with the passwords required.</li>
+<li><code>atlas.ssl.exclude.cipher.suites</code> - the excluded Cipher Suites list -  <b>NULL.</b>,.*RC4.*,.*MD5.*,.*DES.*,.*DSS.* are weak and unsafe Cipher Suites that are excluded by default. If additional Ciphers need to be excluded, set this property with the default Cipher Suites such as atlas.ssl.exclude.cipher.suites=.*NULL.*, .*RC4.*, .*MD5.*, .*DES.*, .*DSS.*, and add the additional Ciper Suites to the list with a comma separator. They can be added with their full name or a regular expression. The Cipher Suites listed in the atlas.ssl.exclude.cipher.suites property will have precedence over the default Cipher Suites. One would keep the default Cipher Suites, and add additional ones to be safe.</li></ul></div>
 <div class="section">
 <h5><a name="Credential_Provider_Utility_Script"></a>Credential Provider Utility Script</h5>
 <p>In order to prevent the use of clear-text passwords, the Atlas platofrm makes use of the Credential Provider facility for secure password storage (see <a class="externalLink" href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html#credential">Hadoop Credential Command Reference</a> for more information about this facility).  The cputil script in the 'bin' directory can be leveraged to create the password store required.</p>
@@ -284,7 +285,7 @@
 <p>For a more detailed discussion of the HTTP authentication mechanism refer to <a class="externalLink" href="http://hadoop.apache.org/docs/stable/hadoop-auth/Configuration.html">Hadoop Auth, Java HTTP SPNEGO 2.6.0 - Server Side Configuration</a>.  The prefix that document references is &quot;atlas.http.authentication&quot; in the case of the Atlas authentication implementation.</p></div>
 <div class="section">
 <h4><a name="Client_security_configuration"></a>Client security configuration</h4>
-<p>When leveraging Atlas client code to communicate with an Atlas server configured for SSL transport and/or Kerberos authentication, there is a requirement to provide a client configuration file that provides the security properties that allow for communication with, or authenticating to, the server. Create a client.properties file with the appropriate settings (see below) and place it on the client's classpath or in the directory specified by the &quot;atlas.conf&quot; system property.</p>
+<p>When leveraging Atlas client code to communicate with an Atlas server configured for SSL transport and/or Kerberos authentication, there is a requirement to provide the Atlas client configuration file that provides the security properties that allow for communication with, or authenticating to, the server. Update the atlas-application.properties file with the appropriate settings (see below) and copy it to the client's classpath or to the directory specified by the &quot;atlas.conf&quot; system property.</p>
 <p>The client properties for SSL communication are:</p>
 <p></p>
 <ul>

http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/a876d178/StormAtlasHook.html
----------------------------------------------------------------------
diff --git a/StormAtlasHook.html b/StormAtlasHook.html
new file mode 100644
index 0000000..b6c3099
--- /dev/null
+++ b/StormAtlasHook.html
@@ -0,0 +1,298 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2016-04-25
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20160425" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Apache Atlas &#x2013; Storm Atlas Bridge</title>
+    <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="./css/site.css" />
+    <link rel="stylesheet" href="./css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script>
+          
+            </head>
+        <body class="topBarEnabled">
+          
+                        
+                    
+                
+
+    <div id="topbar" class="navbar navbar-fixed-top ">
+      <div class="navbar-inner">
+                                  <div class="container" style="width: 68%;"><div class="nav-collapse">
+            
+                
+                                <ul class="nav">
+                          <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Atlas <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="index.html"  title="About">About</a>
+</li>
+                  
+                      <li>      <a href="https://cwiki.apache.org/confluence/display/ATLAS"  title="Wiki">Wiki</a>
+</li>
+                  
+                      <li>      <a href="https://cwiki.apache.org/confluence/display/ATLAS"  title="News">News</a>
+</li>
+                  
+                      <li>      <a href="https://git-wip-us.apache.org/repos/asf/incubator-atlas.git"  title="Git">Git</a>
+</li>
+                  
+                      <li>      <a href="https://issues.apache.org/jira/browse/ATLAS"  title="Jira">Jira</a>
+</li>
+                  
+                      <li>      <a href="https://cwiki.apache.org/confluence/display/ATLAS/PoweredBy"  title="Powered by">Powered by</a>
+</li>
+                  
+                      <li>      <a href="http://blogs.apache.org/atlas/"  title="Blog">Blog</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Project Information <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="project-info.html"  title="Summary">Summary</a>
+</li>
+                  
+                      <li>      <a href="mail-lists.html"  title="Mailing Lists">Mailing Lists</a>
+</li>
+                  
+                      <li>      <a href="http://webchat.freenode.net?channels=apacheatlas&uio=d4"  title="IRC">IRC</a>
+</li>
+                  
+                      <li>      <a href="team-list.html"  title="Team">Team</a>
+</li>
+                  
+                      <li>      <a href="issue-tracking.html"  title="Issue Tracking">Issue Tracking</a>
+</li>
+                  
+                      <li>      <a href="source-repository.html"  title="Source Repository">Source Repository</a>
+</li>
+                  
+                      <li>      <a href="license.html"  title="License">License</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Releases <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="http://www.apache.org/dyn/closer.cgi/incubator/atlas/0.6.0-incubating/"  title="0.6-incubating">0.6-incubating</a>
+</li>
+                  
+                      <li>      <a href="http://www.apache.org/dyn/closer.cgi/incubator/atlas/0.5.0-incubating/"  title="0.5-incubating">0.5-incubating</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="0.6.0-incubating/index.html"  title="0.6-incubating">0.6-incubating</a>
+</li>
+                  
+                      <li>      <a href="0.5.0-incubating/index.html"  title="0.5-incubating">0.5-incubating</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">ASF <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="http://www.apache.org/foundation/how-it-works.html"  title="How Apache Works">How Apache Works</a>
+</li>
+                  
+                      <li>      <a href="http://www.apache.org/foundation/"  title="Foundation">Foundation</a>
+</li>
+                  
+                      <li>      <a href="http://www.apache.org/foundation/sponsorship.html"  title="Sponsoring Apache">Sponsoring Apache</a>
+</li>
+                  
+                      <li>      <a href="http://www.apache.org/foundation/thanks.html"  title="Thanks">Thanks</a>
+</li>
+                          </ul>
+      </li>
+                  </ul>
+          
+                      <form id="search-form" action="http://www.google.com/search" method="get"  class="navbar-search pull-right" >
+    
+  <input value="http://atlas.incubator.apache.org" name="sitesearch" type="hidden"/>
+  <input class="search-query" name="q" id="query" type="text" />
+</form>
+<script type="text/javascript" src="http://www.google.com/coop/cse/brand?form=search-form"></script>
+          
+                            
+            
+            
+            
+    <iframe src="http://www.facebook.com/plugins/like.php?href=http://atlas.incubator.apache.org/atlas-docs&send=false&layout=button_count&show-faces=false&action=like&colorscheme=dark"
+        scrolling="no" frameborder="0"
+        style="border:none; width:80px; height:20px; margin-top: 10px;"  class="pull-right" ></iframe>
+                        
+    <script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script>
+
+        <ul class="nav pull-right"><li style="margin-top: 10px;">
+    
+    <div class="g-plusone" data-href="http://atlas.incubator.apache.org/atlas-docs" data-size="medium"  width="60px" align="right" ></div>
+
+        </li></ul>
+                              
+                   
+                      </div>
+          
+        </div>
+      </div>
+    </div>
+    
+        <div class="container">
+          <div id="banner">
+        <div class="pull-left">
+                                                  <a href=".." id="bannerLeft">
+                                                                                                <img src="images/atlas-logo.png"  alt="Apache Atlas" width="200px" height="45px"/>
+                </a>
+                      </div>
+        <div class="pull-right">                  <a href="http://incubator.apache.org" id="bannerRight">
+                                                                                                <img src="images/apache-incubator-logo.png"  alt="Apache Incubator"/>
+                </a>
+      </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org" class="externalLink" title="Apache">
+        Apache</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="index.html" title="Atlas">
+        Atlas</a>
+        </li>
+      <li class="divider ">/</li>
+        <li class="">Storm Atlas Bridge</li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right">Last Published: 2016-04-25</li> <li class="divider pull-right">|</li>
+              <li id="projectVersion" class="pull-right">Version: 0.7-incubating-SNAPSHOT</li>
+            
+                            </ul>
+      </div>
+
+      
+                        
+        <div id="bodyColumn" >
+                                  
+            <div class="section">
+<h2><a name="Storm_Atlas_Bridge"></a>Storm Atlas Bridge</h2></div>
+<div class="section">
+<h3><a name="Introduction"></a>Introduction</h3>
+<p>Apache Storm is a distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. The process is essentially a DAG of nodes, which is called <b>topology</b>.</p>
+<p>Apache Atlas is a metadata repository that enables end-to-end data lineage, search and associate business classification.</p>
+<p>The goal of this integration is to push the operational topology metadata along with the underlying data source(s), target(s), derivation processes and any available business context so Atlas can capture the lineage for this topology.</p>
+<p>There are 2 parts in this process detailed below:</p>
+<ul>
+<li>Data model to represent the concepts in Storm</li>
+<li>Storm Atlas Hook to update metadata in Atlas</li></ul></div>
+<div class="section">
+<h3><a name="Storm_Data_Model"></a>Storm Data Model</h3>
+<p>A data model is represented as Types in Atlas. It contains the descriptions of various nodes in the topology graph, such as spouts and bolts and the corresponding producer and consumer types.</p>
+<p>The following types are added in Atlas.</p>
+<p></p>
+<ul>
+<li>storm_topology - represents the coarse-grained topology. A storm_topology derives from an Atlas Process type and hence can be used to inform Atlas about lineage.</li>
+<li>Following data sets are added - kafka_topic, jms_topic, hbase_table, hdfs_data_set. These all derive from an Atlas Dataset type and hence form the end points of a lineage graph.</li>
+<li>storm_spout - Data Producer having outputs, typically Kafka, JMS</li>
+<li>storm_bolt - Data Consumer having inputs and outputs, typically Hive, HBase, HDFS, etc.</li></ul>
+<p>The Storm Atlas hook auto registers dependent models like the Hive data model if it finds that these are not known to the Atlas server.</p>
+<p>The data model for each of the types is described in the class definition at org.apache.atlas.storm.model.StormDataModel.</p></div>
+<div class="section">
+<h3><a name="Storm_Atlas_Hook"></a>Storm Atlas Hook</h3>
+<p>Atlas is notified when a new topology is registered successfully in Storm. Storm provides a hook, backtype.storm.ISubmitterHook, at the Storm client used to submit a storm topology.</p>
+<p>The Storm Atlas hook intercepts the hook post execution and extracts the metadata from the topology and updates Atlas using the types defined. Atlas implements the Storm client hook interface in org.apache.atlas.storm.hook.StormAtlasHook.</p></div>
+<div class="section">
+<h3><a name="Limitations"></a>Limitations</h3>
+<p>The following apply for the first version of the integration.</p>
+<p></p>
+<ul>
+<li>Only new topology submissions are registered with Atlas, any lifecycle changes are not reflected in Atlas.</li>
+<li>The Atlas server needs to be online when a Storm topology is submitted for the metadata to be captured.</li>
+<li>The Hook currently does not support capturing lineage for custom spouts and bolts.</li></ul></div>
+<div class="section">
+<h3><a name="Installation"></a>Installation</h3>
+<p>The Storm Atlas Hook needs to be manually installed in Storm on the client side. The hook artifacts are available at: $ATLAS_PACKAGE/hook/storm</p>
+<p>Storm Atlas hook jars need to be copied to $STORM_HOME/extlib. Replace STORM_HOME with storm installation path.</p>
+<p>Restart all daemons after you have installed the atlas hook into Storm.</p></div>
+<div class="section">
+<h3><a name="Configuration"></a>Configuration</h3></div>
+<div class="section">
+<h4><a name="Storm_Configuration"></a>Storm Configuration</h4>
+<p>The Storm Atlas Hook needs to be configured in Storm client config in <b>$STORM_HOME/conf/storm.yaml</b> as:</p>
+<div class="source">
+<pre>
+storm.topology.submission.notifier.plugin.class: &quot;org.apache.atlas.storm.hook.StormAtlasHook&quot;
+
+</pre></div>
+<p>Also set a 'cluster name' that would be used as a namespace for objects registered in Atlas. This name would be used for namespacing the Storm topology, spouts and bolts.</p>
+<p>The other objects like data sets should ideally be identified with the cluster name of the components that generate them. For e.g. Hive tables and databases should be identified using the cluster name set in Hive. The Storm Atlas hook will pick this up if the Hive configuration is available in the Storm topology jar that is submitted on the client and the cluster name is defined there. This happens similarly for HBase data sets. In case this configuration is not available, the cluster name set in the Storm configuration will be used.</p>
+<div class="source">
+<pre>
+atlas.cluster.name: &quot;cluster_name&quot;
+
+</pre></div>
+<p>In <b>$STORM_HOME/conf/storm_env.ini</b>, set an environment variable as follows:</p>
+<div class="source">
+<pre>
+STORM_JAR_JVM_OPTS:&quot;-Datlas.conf=$ATLAS_HOME/conf/&quot;
+
+</pre></div>
+<p>where ATLAS_HOME is pointing to where ATLAS is installed.</p>
+<p>You could also set this up programatically in Storm Config as:</p>
+<div class="source">
+<pre>
+    Config stormConf = new Config();
+    ...
+    stormConf.put(Config.STORM_TOPOLOGY_SUBMISSION_NOTIFIER_PLUGIN,
+            org.apache.atlas.storm.hook.StormAtlasHook.class.getName());
+
+</pre></div></div>
+                  </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container">
+              <div class="row span12">Copyright &copy;                    2015-2016
+                        <a href="http://www.apache.org">Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                          
+                <p id="poweredBy" class="pull-right">
+                          <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" />
+      </a>
+              </p>
+        
+                </div>
+    </footer>
+  </body>
+</html>



Mime
View raw message