Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 162C2200B4E for ; Sat, 9 Jul 2016 20:42:38 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 14DCA160A59; Sat, 9 Jul 2016 18:42:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C0341160A74 for ; Sat, 9 Jul 2016 20:42:35 +0200 (CEST) Received: (qmail 67784 invoked by uid 500); 9 Jul 2016 18:42:35 -0000 Mailing-List: contact commits-help@atlas.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@atlas.incubator.apache.org Delivered-To: mailing list commits@atlas.incubator.apache.org Received: (qmail 67775 invoked by uid 99); 9 Jul 2016 18:42:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jul 2016 18:42:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 62629C034D for ; Sat, 9 Jul 2016 18:42:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.646 X-Spam-Level: X-Spam-Status: No, score=-4.646 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id S4XYETzvPpYy for ; Sat, 9 Jul 2016 18:42:26 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 47E5360DC5 for ; Sat, 9 Jul 2016 18:42:15 +0000 (UTC) Received: (qmail 66868 invoked by uid 99); 9 Jul 2016 18:42:14 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jul 2016 18:42:14 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 27765DFDCE; Sat, 9 Jul 2016 18:42:13 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: shwethags@apache.org To: commits@atlas.incubator.apache.org Date: Sat, 09 Jul 2016 18:42:26 -0000 Message-Id: <215102eec1654190923ddc94be7f31da@git.apache.org> In-Reply-To: <1d018c2708764386897d4ed848c1cac3@git.apache.org> References: <1d018c2708764386897d4ed848c1cac3@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [14/14] incubator-atlas-website git commit: updating site for 0.7 release archived-at: Sat, 09 Jul 2016 18:42:38 -0000 updating site for 0.7 release Project: http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/commit/60041d8d Tree: http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/tree/60041d8d Diff: http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/diff/60041d8d Branch: refs/heads/asf-site Commit: 60041d8d3fa14c4ad6ed19b99dfb0459739d9ae6 Parents: 05ef63f Author: Shwetha GS Authored: Sun Jul 10 00:12:01 2016 +0530 Committer: Shwetha GS Committed: Sun Jul 10 00:12:01 2016 +0530 ---------------------------------------------------------------------- 0.7.0-incubating/Architecture.html | 261 +++++++ .../Authentication-Authorization.html | 382 ++++++++++ 0.7.0-incubating/Bridge-Falcon.html | 279 +++++++ 0.7.0-incubating/Bridge-Hive.html | 325 ++++++++ 0.7.0-incubating/Bridge-Sqoop.html | 262 +++++++ 0.7.0-incubating/Configuration.html | 459 +++++++++++ 0.7.0-incubating/HighAvailability.html | 405 ++++++++++ 0.7.0-incubating/InstallationSteps.html | 556 ++++++++++++++ 0.7.0-incubating/Notification-Entity.html | 263 +++++++ 0.7.0-incubating/QuickStart.html | 254 +++++++ 0.7.0-incubating/Repository.html | 235 ++++++ 0.7.0-incubating/Search.html | 341 +++++++++ 0.7.0-incubating/StormAtlasHook.html | 304 ++++++++ 0.7.0-incubating/TypeSystem.html | 258 +++++++ 0.7.0-incubating/api/apple-touch-icon.png | Bin 0 -> 640 bytes 0.7.0-incubating/api/application.wadl | 674 ++++++++++++++++ 0.7.0-incubating/api/atlas-webapp-php.zip | Bin 0 -> 2066 bytes 0.7.0-incubating/api/atlas-webapp.rb | 246 ++++++ 0.7.0-incubating/api/crossdomain.xml | 25 + 0.7.0-incubating/api/css/home.gif | Bin 0 -> 128 bytes 0.7.0-incubating/api/css/prettify.css | 1 + 0.7.0-incubating/api/css/style.css | 759 +++++++++++++++++++ 0.7.0-incubating/api/downloads.html | 163 ++++ 0.7.0-incubating/api/el_ns0_errorBean.html | 132 ++++ 0.7.0-incubating/api/el_ns0_results.html | 131 ++++ 0.7.0-incubating/api/favicon.ico | Bin 0 -> 1150 bytes 0.7.0-incubating/api/index.html | 157 ++++ 0.7.0-incubating/api/js/libs/dd_belatedpng.js | 13 + .../api/js/libs/jquery-1.5.1.min.js | 16 + .../api/js/libs/modernizr-1.7.min.js | 2 + .../api/js/libs/prettify/lang-apollo.js | 2 + .../api/js/libs/prettify/lang-clj.js | 18 + .../api/js/libs/prettify/lang-css.js | 2 + .../api/js/libs/prettify/lang-go.js | 1 + .../api/js/libs/prettify/lang-hs.js | 2 + .../api/js/libs/prettify/lang-lisp.js | 3 + .../api/js/libs/prettify/lang-lua.js | 2 + .../api/js/libs/prettify/lang-ml.js | 2 + 0.7.0-incubating/api/js/libs/prettify/lang-n.js | 4 + .../api/js/libs/prettify/lang-proto.js | 1 + .../api/js/libs/prettify/lang-scala.js | 2 + .../api/js/libs/prettify/lang-sql.js | 2 + .../api/js/libs/prettify/lang-tex.js | 1 + .../api/js/libs/prettify/lang-vb.js | 2 + .../api/js/libs/prettify/lang-vhdl.js | 3 + .../api/js/libs/prettify/lang-wiki.js | 2 + .../api/js/libs/prettify/lang-xq.js | 3 + .../api/js/libs/prettify/lang-yaml.js | 2 + .../api/js/libs/prettify/prettify.js | 28 + 0.7.0-incubating/api/js/libs/xbreadcrumbs.js | 93 +++ 0.7.0-incubating/api/model.html | 140 ++++ 0.7.0-incubating/api/ns0.html | 131 ++++ 0.7.0-incubating/api/ns0.xsd | 25 + 0.7.0-incubating/api/ns0_errorBean.html | 189 +++++ 0.7.0-incubating/api/ns0_results.html | 173 +++++ .../api/resource_AdminResource.html | 199 +++++ .../api/resource_DataSetLineageResource.html | 234 ++++++ .../api/resource_EntityResource.html | 599 +++++++++++++++ .../api/resource_EntityService.html | 376 +++++++++ .../api/resource_LineageResource.html | 234 ++++++ .../api/resource_MetadataDiscoveryResource.html | 273 +++++++ .../api/resource_TaxonomyService.html | 758 ++++++++++++++++++ .../api/resource_TypesResource.html | 249 ++++++ 0.7.0-incubating/api/rest.html | 120 +++ 0.7.0-incubating/api/robots.txt | 5 + .../css/apache-maven-fluido-1.3.0.min.css | 9 + 0.7.0-incubating/css/print.css | 23 + 0.7.0-incubating/css/site.css | 1 + .../images/accessories-text-editor.png | Bin 0 -> 746 bytes 0.7.0-incubating/images/add.gif | Bin 0 -> 397 bytes .../images/apache-incubator-logo.png | Bin 0 -> 4234 bytes .../images/apache-maven-project-2.png | Bin 0 -> 33442 bytes .../images/application-certificate.png | Bin 0 -> 923 bytes 0.7.0-incubating/images/atlas-logo.png | Bin 0 -> 3115 bytes 0.7.0-incubating/images/contact-new.png | Bin 0 -> 736 bytes 0.7.0-incubating/images/document-properties.png | Bin 0 -> 577 bytes 0.7.0-incubating/images/drive-harddisk.png | Bin 0 -> 700 bytes 0.7.0-incubating/images/fix.gif | Bin 0 -> 366 bytes 0.7.0-incubating/images/icon_error_sml.gif | Bin 0 -> 633 bytes 0.7.0-incubating/images/icon_help_sml.gif | Bin 0 -> 1072 bytes 0.7.0-incubating/images/icon_info_sml.gif | Bin 0 -> 638 bytes 0.7.0-incubating/images/icon_success_sml.gif | Bin 0 -> 604 bytes 0.7.0-incubating/images/icon_warning_sml.gif | Bin 0 -> 625 bytes 0.7.0-incubating/images/image-x-generic.png | Bin 0 -> 662 bytes .../images/internet-web-browser.png | Bin 0 -> 1017 bytes .../images/logos/build-by-maven-black.png | Bin 0 -> 2294 bytes .../images/logos/build-by-maven-white.png | Bin 0 -> 2260 bytes 0.7.0-incubating/images/logos/maven-feather.png | Bin 0 -> 3330 bytes 0.7.0-incubating/images/network-server.png | Bin 0 -> 536 bytes 0.7.0-incubating/images/package-x-generic.png | Bin 0 -> 717 bytes .../images/profiles/pre-release.png | Bin 0 -> 32607 bytes 0.7.0-incubating/images/profiles/retired.png | Bin 0 -> 22003 bytes 0.7.0-incubating/images/profiles/sandbox.png | Bin 0 -> 33010 bytes 0.7.0-incubating/images/remove.gif | Bin 0 -> 607 bytes 0.7.0-incubating/images/rss.png | Bin 0 -> 474 bytes 0.7.0-incubating/images/twiki/architecture.png | Bin 0 -> 58775 bytes 0.7.0-incubating/images/twiki/data-types.png | Bin 0 -> 413738 bytes .../images/twiki/guide-class-diagram.png | Bin 0 -> 40375 bytes .../images/twiki/guide-instance-graph.png | Bin 0 -> 179941 bytes 0.7.0-incubating/images/twiki/notification.png | Bin 0 -> 137448 bytes .../images/twiki/types-instance.png | Bin 0 -> 445893 bytes 0.7.0-incubating/images/update.gif | Bin 0 -> 1090 bytes 0.7.0-incubating/images/window-new.png | Bin 0 -> 583 bytes .../img/glyphicons-halflings-white.png | Bin 0 -> 8777 bytes 0.7.0-incubating/img/glyphicons-halflings.png | Bin 0 -> 12799 bytes 0.7.0-incubating/index.html | 304 ++++++++ 0.7.0-incubating/issue-tracking.html | 239 ++++++ .../js/apache-maven-fluido-1.3.0.min.js | 21 + 0.7.0-incubating/license.html | 441 +++++++++++ 0.7.0-incubating/mail-lists.html | 253 +++++++ 0.7.0-incubating/project-info.html | 258 +++++++ 0.7.0-incubating/security.html | 501 ++++++++++++ 0.7.0-incubating/source-repository.html | 252 ++++++ 0.7.0-incubating/team-list.html | 463 +++++++++++ Architecture.html | 16 +- Authentication-Authorization.html | 16 +- Bridge-Falcon.html | 16 +- Bridge-Hive.html | 16 +- Bridge-Sqoop.html | 16 +- Configuration.html | 16 +- HighAvailability.html | 16 +- InstallationSteps.html | 16 +- Notification-Entity.html | 16 +- QuickStart.html | 16 +- Repository.html | 16 +- Search.html | 16 +- Security.html | 16 +- StormAtlasHook.html | 16 +- TypeSystem.html | 16 +- api/atlas-webapp-php.zip | Bin 0 -> 2066 bytes api/atlas-webapp.rb | 246 ++++++ api/downloads.html | 163 ++++ api/el_ns0_errorBean.html | 132 ++++ api/el_ns0_results.html | 131 ++++ api/index.html | 32 + api/model.html | 29 + api/ns0.html | 131 ++++ api/ns0.xsd | 25 + api/ns0_errorBean.html | 189 +++++ api/ns0_results.html | 173 +++++ api/rest.html | 6 + index.html | 14 +- issue-tracking.html | 16 +- license.html | 16 +- mail-lists.html | 16 +- project-info.html | 16 +- source-repository.html | 16 +- team-list.html | 16 +- 148 files changed, 15046 insertions(+), 109 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/60041d8d/0.7.0-incubating/Architecture.html ---------------------------------------------------------------------- diff --git a/0.7.0-incubating/Architecture.html b/0.7.0-incubating/Architecture.html new file mode 100644 index 0000000..6c62d81 --- /dev/null +++ b/0.7.0-incubating/Architecture.html @@ -0,0 +1,261 @@ + + + + + + + + + Apache Atlas – Architecture + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+

Architecture

+
+

Introduction

+
+

Atlas High Level Architecture - Overview

+

+

Architecturally, Atlas has the following components:

+

+
    +
  • A Web service: This exposes RESTful APIs and a Web user interface to create, update and query metadata.
  • +
  • Metadata store: Metadata is modeled using a graph model, implemented using the Graph database Titan. Titan has options for a variety of backing stores for persisting the graph, including an embedded Berkeley DB, Apache HBase and Apache Cassandra. The choice of the backing store determines the level of service availability.
  • +
  • Index store: For powering full text searches on metadata, Atlas also indexes the metadata, again via Titan. The backing store for the full text search is a search backend like ElasticSearch or Apache Solr.
  • +
  • Bridges / Hooks: To add metadata to Atlas, libraries called ‘hooks’ are enabled in various systems like Apache Hive, Apache Falcon and Apache Sqoop which capture metadata events in the respective systems and propagate those events to Atlas. The Atlas server consumes these events and updates its stores.
  • +
  • Metadata notification events: Any updates to metadata in Atlas, either via the Hooks or the API are propagated from Atlas to downstream systems via events. Systems like Apache Ranger consume these events and allow administrators to act on them, for e.g. to configure policies for Access control.
  • +
  • Notification Server: Atlas uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata notification events. Events are written by the hooks and Atlas to different Kafka topics. Kafka enables a loosely coupled integration between these disparate systems.
+
+

Bridges

+

External components like hive/sqoop/storm/falcon should model their taxonomy using typesystem and register the types with Atlas. For every entity created in this external component, the corresponding entity should be registered in Atlas as well. This is typically done in a hook which runs in the external component and is called for every entity operation. Hook generally processes the entity asynchronously using a thread pool to avoid adding latency to the main operation. The hook can then build the entity and register the entity using Atlas REST APIs. Howerver, any failure in APIs because of network issue etc can result in entity not registered in Atlas and hence inconsistent metadata.

+

Atlas exposes notification interface and can be used for reliable entity registration by hook as well. The hook can send notification message containing the list of entities to be registered. Atlas service contains hook consumer that listens to these messages and registers the entities.

+

Available bridges are:

+
+
+

Notification

+

Notification is used for reliable entity registration from hooks and for entity/type change notifications. Atlas, by default, provides Kafka integration, but its possible to provide other implementations as well. Atlas service starts embedded Kafka server by default.

+

Atlas also provides NotificationHookConsumer that runs in Atlas Service and listens to messages from hook and registers the entities in Atlas.

+
+
+ +
+ + + + http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/60041d8d/0.7.0-incubating/Authentication-Authorization.html ---------------------------------------------------------------------- diff --git a/0.7.0-incubating/Authentication-Authorization.html b/0.7.0-incubating/Authentication-Authorization.html new file mode 100644 index 0000000..9800a2f --- /dev/null +++ b/0.7.0-incubating/Authentication-Authorization.html @@ -0,0 +1,382 @@ + + + + + + + + + Apache Atlas – Authentication & Authorization in Apache Atlas. + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+

Authentication & Authorization in Apache Atlas.

+
+

Authentication

+

Atlas supports following authentication methods

+

+
    +
  • File
  • +
  • Kerberos
  • +
  • LDAP
+

Following properties should be set true to enable the authentication of that type in atlas-application.properties file.

+
+
+atlas.authentication.method.kerberos=true|false
+atlas.authentication.method.ldap=true|false
+atlas.authentication.method.file=true|false
+
+
+

If two or more authentication methods are set to true, then the authentication falls back to the latter method if the earlier one fails. For example if Kerberos authentication is set to true and ldap authentication is also set to true then, if for a request without kerberos principal and keytab LDAP authentication will be used as a fallback scenario.

+
+
FILE method.
+

File authentication requires users' login details in users credentials file in the format specified below and the file path should set to property atlas.authentication.method.file.filename in atlas-application.properties.

+
+
+atlas.authentication.method.file=true
+atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties
+
+
+

The users credentials file should have below format

+
+
+username=group::sha256-password
+
+
+

For e.g.

+
+
+admin=ADMIN::e7cf3ef4f17c3999a94f2c6f612e8a888e5b1026878e4e19398b23bd38ec221a
+
+
+

Users group can be either ADMIN, DATA_STEWARD OR DATA_SCIENTIST

+

Note:-password is encoded with sha256 encoding method and can be generated using unix tool.

+

For e.g.

+
+
+echo -n "Password" | sha256sum
+e7cf3ef4f17c3999a94f2c6f612e8a888e5b1026878e4e19398b23bd38ec221a  -
+
+
+
+
Kerberos Method.
+

To enable the authentication in Kerberos mode in Atlas, set the property atlas.authentication.method.kerberos to true in atlas-application.properties

+
+
+atlas.authentication.method.kerberos = true
+
+
+

Also following properties should be set.

+
+
+atlas.authentication.method.kerberos.principal=<principal>/<fqdn>@EXAMPLE.COM
+atlas.authentication.method.kerberos.keytab = /<key tab filepath>.keytab
+atlas.authentication.method.kerberos.name.rules = RULE:[2:$1@$0](atlas@EXAMPLE.COM)s/.*/atlas/
+
+
+
+
LDAP Method.
+

To enable the authentication in LDAP mode in Atlas, set the property atlas.authentication.method.ldap to true and also set Ldap type to property atlas.authentication.method.ldap.type to LDAP or AD in atlas-application.properties. Use AD if connecting to Active Directory.

+
+
+atlas.authentication.method.ldap=true
+atlas.authentication.method.ldap.type=ldap|ad
+
+
+

For LDAP or AD the following configuration needs to be set in atlas application properties.

+

Active Directory

+
+
+atlas.authentication.method.ldap.ad.domain= example.com
+atlas.authentication.method.ldap.ad.url=ldap://<AD server ip>:389
+atlas.authentication.method.ldap.ad.base.dn=DC=example,DC=com
+atlas.authentication.method.ldap.ad.bind.dn=CN=Administrator,CN=Users,DC=example,DC=com
+atlas.authentication.method.ldap.ad.bind.password=<password>
+atlas.authentication.method.ldap.ad.referral=ignore
+atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
+atlas.authentication.method.ldap.ad.default.role=ROLE_USER
+
+
+

LDAP Directroy

+
+
+atlas.authentication.method.ldap.url=ldap://<Ldap server ip>:389
+atlas.authentication.method.ldap.userDNpattern=uid={0],ou=users,dc=example,dc=com
+atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
+atlas.authentication.method.ldap.groupSearchFilter=(member=cn={0},ou=users,dc=example,dc=com
+atlas.authentication.method.ldap.groupRoleAttribute=cn
+atlas.authentication.method.ldap.base.dn=dc=example,dc=com
+atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
+atlas.authentication.method.ldap.bind.password=<password>
+atlas.authentication.method.ldap.referral=ignore
+atlas.authentication.method.ldap.user.searchfilter=(uid={0})
+atlas.authentication.method.ldap.default.role=ROLE_USER
+
+
+
+

Authorization

+
+
Atlas Authorization Methods [Simple/Ranger]
+

To set authorization in atlas, update the atlas.authorizer.impl properties in atlas-application.properties

+
    +
  • Simple
  • +
  • Ranger
+
+
+atlas.authorizer.impl=simple | ranger | <Qualified Authorizer Class Name>
+
+
+
+
Simple Authorizer.
+

In Simple Authorizer the policy store file is configured locally. The path of policy store file is set in atlas.auth.policy.file property of atlas-application.properties

+
+
+atlas.auth.policy.file={{conf_dir}}/policy-store.txt
+
+
+

The policy store file format is as follows:

+
+
+Policy_Name;;User_Name:Operations_Allowed;;Group_Name:Operations_Allowed;;Resource_Type:Resource_Name
+
+
+

eg. of admin policy:

+
+
+adminPolicy;;admin:rwud;;ROLE_ADMIN:rwud;;type:*,entity:*,operation:*,taxonomy:*,term:*
+
+
+

Note : The User_Name, Group_Name and Operations_Allowed are comma(,) separated lists.

+

Authorizer Resource Types:

+
    +
  • Operation
  • +
  • Type
  • +
  • Entity
  • +
  • Taxonomy
  • +
  • Term
  • +
  • Unknown
+

Operations_Allowed are r = read, w = write, u = update, d = delete

+
+
Ranger Authorizer.
+

Ranger Authorizer is enabled by activating Atlas-Ranger plugin from Ambari.

+

For more details visit the Apache-Ranger documentation.

+
+
+ +
+ + + + http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/60041d8d/0.7.0-incubating/Bridge-Falcon.html ---------------------------------------------------------------------- diff --git a/0.7.0-incubating/Bridge-Falcon.html b/0.7.0-incubating/Bridge-Falcon.html new file mode 100644 index 0000000..bc0dbb8 --- /dev/null +++ b/0.7.0-incubating/Bridge-Falcon.html @@ -0,0 +1,279 @@ + + + + + + + + + Apache Atlas – Falcon Atlas Bridge + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+

Falcon Atlas Bridge

+
+

Falcon Model

+

The default falcon modelling is available in org.apache.atlas.falcon.model.FalconDataModelGenerator. It defines the following types:

+
+
+falcon_cluster(ClassType) - super types [Infrastructure] - attributes [timestamp, colo, owner, tags]
+falcon_feed(ClassType) - super types [DataSet] - attributes [timestamp, stored-in, owner, groups, tags]
+falcon_feed_creation(ClassType) - super types [Process] - attributes [timestamp, stored-in, owner]
+falcon_feed_replication(ClassType) - super types [Process] - attributes [timestamp, owner]
+falcon_process(ClassType) - super types [Process] - attributes [timestamp, runs-on, owner, tags, pipelines, workflow-properties]
+
+
+

One falcon_process entity is created for every cluster that the falcon process is defined for.

+

The entities are created and de-duped using unique qualifiedName attribute. They provide namespace and can be used for querying/lineage as well. The unique attributes are:

+
    +
  • falcon_process - <process name>@<cluster name>
  • +
  • falcon_cluster - <cluster name>
  • +
  • falcon_feed - <feed name>@<cluster name>
  • +
  • falcon_feed_creation - <feed name>
  • +
  • falcon_feed_replication - <feed name>
+
+

Falcon Hook

+

Falcon supports listeners on falcon entity submission. This is used to add entities in Atlas using the model defined in org.apache.atlas.falcon.model.FalconDataModelGenerator. The hook submits the request to a thread pool executor to avoid blocking the command execution. The thread submits the entities as message to the notification server and atlas server reads these messages and registers the entities.

+
    +
  • Add 'org.apache.atlas.falcon.service.AtlasService' to application.services in <falcon-conf>/startup.properties
  • +
  • Link falcon hook jars in falcon classpath - 'ln -s <atlas-home>/hook/falcon/* <falcon-home>/server/webapp/falcon/WEB-INF/lib/'
  • +
  • In <falcon_conf>/falcon-env.sh, set an environment variable as follows:
+
+
+     export FALCON_SERVER_OPTS="$FALCON_SERVER_OPTS -Datlas.conf=<atlas-conf>"
+     
+
+

The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details:

+
    +
  • atlas.hook.falcon.synchronous - boolean, true to run the hook synchronously. default false
  • +
  • atlas.hook.falcon.numRetries - number of retries for notification failure. default 3
  • +
  • atlas.hook.falcon.minThreads - core number of threads. default 5
  • +
  • atlas.hook.falcon.maxThreads - maximum number of threads. default 5
  • +
  • atlas.hook.falcon.keepAliveTime - keep alive time in msecs. default 10
  • +
  • atlas.hook.falcon.queueSize - queue size for the threadpool. default 10000
+

Refer Configuration for notification related configurations

+
+

Limitations

+

+
    +
  • In falcon cluster entity, cluster name used should be uniform across components like hive, falcon, sqoop etc. If used with ambari, ambari cluster name should be used for cluster entity
+
+
+ +
+ + + + http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/60041d8d/0.7.0-incubating/Bridge-Hive.html ---------------------------------------------------------------------- diff --git a/0.7.0-incubating/Bridge-Hive.html b/0.7.0-incubating/Bridge-Hive.html new file mode 100644 index 0000000..0197417 --- /dev/null +++ b/0.7.0-incubating/Bridge-Hive.html @@ -0,0 +1,325 @@ + + + + + + + + + Apache Atlas – Hive Atlas Bridge + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+

Hive Atlas Bridge

+
+

Hive Model

+

The default hive modelling is available in org.apache.atlas.hive.model.HiveDataModelGenerator. It defines the following types:

+
+
+hive_db(ClassType) - super types [Referenceable] - attributes [name, clusterName, description, locationUri, parameters, ownerName, ownerType]
+hive_storagedesc(ClassType) - super types [Referenceable] - attributes [cols, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories]
+hive_column(ClassType) - super types [Referenceable] - attributes [name, type, comment, table]
+hive_table(ClassType) - super types [DataSet] - attributes [name, db, owner, createTime, lastAccessTime, comment, retention, sd, partitionKeys, columns, aliases, parameters, viewOriginalText, viewExpandedText, tableType, temporary]
+hive_process(ClassType) - super types [Process] - attributes [name, startTime, endTime, userName, operationType, queryText, queryPlan, queryId]
+hive_principal_type(EnumType) - values [USER, ROLE, GROUP]
+hive_order(StructType) - attributes [col, order]
+hive_serde(StructType) - attributes [name, serializationLib, parameters]
+
+
+

The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying/lineage as well. Note that dbName, tableName and columnName should be in lower case. clusterName is explained below.

+
    +
  • hive_db - attribute qualifiedName - <dbName>@<clusterName>
  • +
  • hive_table - attribute qualifiedName - <dbName>.<tableName>@<clusterName>
  • +
  • hive_column - attribute qualifiedName - <dbName>.<tableName>.<columnName>@<clusterName>
  • +
  • hive_process - attribute name - <queryString> - trimmed query string in lower case
+
+

Importing Hive Metadata

+

org.apache.atlas.hive.bridge.HiveMetaStoreBridge imports the Hive metadata into Atlas using the model defined in org.apache.atlas.hive.model.HiveDataModelGenerator.

+

import-hive.sh command can be used to facilitate this. The script needs Hadoop and Hive classpath jars.

+

* For Hadoop jars, please make sure that the environment variable HADOOP_CLASSPATH is set. Another way is to set HADOOP_HOME to point to root directory of your Hadoop installation

+

* Similarly, for Hive jars, set HIVE_HOME to the root of Hive installation

+

* Set environment variable HIVE_CONF_DIR to Hive configuration directory

+
+
+    Usage: <atlas package>/bin/import-hive.sh
+    
+
+

The logs are in <atlas package>/logs/import-hive.log

+

If you you are importing metadata in a kerberized cluster you need to run the command like this:

+
+
+<atlas package>/bin/import-hive.sh -Dsun.security.jgss.debug=true -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.krb5.conf=[krb5.conf location] -Djava.security.auth.login.config=[jaas.conf location]
+
+
+

+
+
+

Hive Hook

+

Hive supports listeners on hive command execution using hive hooks. This is used to add/update/remove entities in Atlas using the model defined in org.apache.atlas.hive.model.HiveDataModelGenerator. The hook submits the request to a thread pool executor to avoid blocking the command execution. The thread submits the entities as message to the notification server and atlas server reads these messages and registers the entities. Follow these instructions in your hive set-up to add hive hook for Atlas:

+
    +
  • Set-up atlas hook in hive-site.xml of your hive configuration:
+
+
+    <property>
+      <name>hive.exec.post.hooks</name>
+      <value>org.apache.atlas.hive.hook.HiveHook</value>
+    </property>
+  
+
+
+
+    <property>
+      <name>atlas.cluster.name</name>
+      <value>primary</value>
+    </property>
+  
+
+

+
    +
  • Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh of your hive configuration
  • +
  • Copy <atlas-conf>/atlas-application.properties to the hive conf directory.
+

The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details:

+
    +
  • atlas.hook.hive.synchronous - boolean, true to run the hook synchronously. default false. Recommended to be set to false to avoid delays in hive query completion.
  • +
  • atlas.hook.hive.numRetries - number of retries for notification failure. default 3
  • +
  • atlas.hook.hive.minThreads - core number of threads. default 5
  • +
  • atlas.hook.hive.maxThreads - maximum number of threads. default 5
  • +
  • atlas.hook.hive.keepAliveTime - keep alive time in msecs. default 10
  • +
  • atlas.hook.hive.queueSize - queue size for the threadpool. default 10000
+

Refer Configuration for notification related configurations

+
+

Limitations

+

+
    +
  • Since database name, table name and column names are case insensitive in hive, the corresponding names in entities are lowercase. So, any search APIs should use lowercase while querying on the entity names
  • +
  • The following hive operations are captured by hive hook currently +
      +
    • create database
    • +
    • create table/view, create table as select
    • +
    • load, import, export
    • +
    • DMLs (insert)
    • +
    • alter database
    • +
    • alter table (skewed table information, stored as, protection is not supported)
    • +
    • alter view
+
+
+ +
+ + + + http://git-wip-us.apache.org/repos/asf/incubator-atlas-website/blob/60041d8d/0.7.0-incubating/Bridge-Sqoop.html ---------------------------------------------------------------------- diff --git a/0.7.0-incubating/Bridge-Sqoop.html b/0.7.0-incubating/Bridge-Sqoop.html new file mode 100644 index 0000000..cb2a224 --- /dev/null +++ b/0.7.0-incubating/Bridge-Sqoop.html @@ -0,0 +1,262 @@ + + + + + + + + + Apache Atlas – Sqoop Atlas Bridge + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+

Sqoop Atlas Bridge

+
+

Sqoop Model

+

The default Sqoop modelling is available in org.apache.atlas.sqoop.model.SqoopDataModelGenerator. It defines the following types:

+
+
+sqoop_operation_type(EnumType) - values [IMPORT, EXPORT, EVAL]
+sqoop_dbstore_usage(EnumType) - values [TABLE, QUERY, PROCEDURE, OTHER]
+sqoop_process(ClassType) - super types [Process] - attributes [name, operation, dbStore, hiveTable, commandlineOpts, startTime, endTime, userName]
+sqoop_dbdatastore(ClassType) - super types [DataSet] - attributes [name, dbStoreType, storeUse, storeUri, source, description, ownerName]
+
+
+

The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying as well: sqoop_process - attribute name - sqoop-dbStoreType-storeUri-endTime sqoop_dbdatastore - attribute name - dbStoreType-connectorUrl-source

+
+

Sqoop Hook

+

Sqoop added a SqoopJobDataPublisher that publishes data to Atlas after completion of import Job. Today, only hiveImport is supported in sqoopHook. This is used to add entities in Atlas using the model defined in org.apache.atlas.sqoop.model.SqoopDataModelGenerator. Follow these instructions in your sqoop set-up to add sqoop hook for Atlas in <sqoop-conf>/sqoop-site.xml:

+

+
    +
  • Sqoop Job publisher class. Currently only one publishing class is supported
sqoop.job.data.publish.class org.apache.atlas.sqoop.hook.SqoopHook +
    +
  • Atlas cluster name
atlas.cluster.name +
    +
  • Copy <atlas-conf>/atlas-application.properties to to the sqoop conf directory <sqoop-conf>/
  • +
  • Link <atlas-home>/hook/sqoop/*.jar in sqoop lib
+

Refer Configuration for notification related configurations

+
+

Limitations

+

+
    +
  • Only the following sqoop operations are captured by sqoop hook currently - hiveImport
+
+
+ +
+ + + +