Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 92DA1200498 for ; Tue, 29 Aug 2017 23:37:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9151C167A9A; Tue, 29 Aug 2017 21:37:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3B2A1167A98 for ; Tue, 29 Aug 2017 23:37:25 +0200 (CEST) Received: (qmail 63936 invoked by uid 500); 29 Aug 2017 21:37:23 -0000 Mailing-List: contact commits-help@atlas.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@atlas.apache.org Delivered-To: mailing list commits@atlas.apache.org Received: (qmail 63922 invoked by uid 99); 29 Aug 2017 21:37:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Aug 2017 21:37:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 2AD581A5017 for ; Tue, 29 Aug 2017 21:37:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.222 X-Spam-Level: X-Spam-Status: No, score=-4.222 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id pCG092LNLwQj for ; Tue, 29 Aug 2017 21:36:51 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 1705961038 for ; Tue, 29 Aug 2017 21:36:33 +0000 (UTC) Received: (qmail 56425 invoked by uid 99); 29 Aug 2017 21:36:33 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Aug 2017 21:36:33 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id F20B4F5FF5; Tue, 29 Aug 2017 21:36:31 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: sarath@apache.org To: commits@atlas.incubator.apache.org Date: Tue, 29 Aug 2017 21:37:10 -0000 Message-Id: <34b7c094c7944da0ba65303f06f265ba@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [41/42] atlas-website git commit: ATLAS-2068: Update atlas website about 0.8.1 release archived-at: Tue, 29 Aug 2017 21:37:27 -0000 http://git-wip-us.apache.org/repos/asf/atlas-website/blob/c5b7bdb3/0.8.1/Bridge-Sqoop.html ---------------------------------------------------------------------- diff --git a/0.8.1/Bridge-Sqoop.html b/0.8.1/Bridge-Sqoop.html new file mode 100644 index 0000000..371ccf5 --- /dev/null +++ b/0.8.1/Bridge-Sqoop.html @@ -0,0 +1,277 @@ + + + + + + + + + Apache Atlas – Sqoop Atlas Bridge + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+

Sqoop Atlas Bridge

+
+

Sqoop Model

+

The default Sqoop modelling is available in org.apache.atlas.sqoop.model.SqoopDataModelGenerator. It defines the following types:

+
+
+sqoop_operation_type(EnumType) - values [IMPORT, EXPORT, EVAL]
+sqoop_dbstore_usage(EnumType) - values [TABLE, QUERY, PROCEDURE, OTHER]
+sqoop_process(ClassType) - super types [Process] - attributes [name, operation, dbStore, hiveTable, commandlineOpts, startTime, endTime, userName]
+sqoop_dbdatastore(ClassType) - super types [DataSet] - attributes [name, dbStoreType, storeUse, storeUri, source, description, ownerName]
+
+
+

The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying as well: sqoop_process - attribute name - sqoop-dbStoreType-storeUri-endTime sqoop_dbdatastore - attribute name - dbStoreType-connectorUrl-source

+
+

Sqoop Hook

+

Sqoop added a SqoopJobDataPublisher that publishes data to Atlas after completion of import Job. Today, only hiveImport is supported in sqoopHook. This is used to add entities in Atlas using the model defined in org.apache.atlas.sqoop.model.SqoopDataModelGenerator. Follow these instructions in your sqoop set-up to add sqoop hook for Atlas in <sqoop-conf>/sqoop-site.xml:

+

+
    +
  • Sqoop Job publisher class. Currently only one publishing class is supported
sqoop.job.data.publish.class org.apache.atlas.sqoop.hook.SqoopHook +
    +
  • Atlas cluster name
atlas.cluster.name +
    +
  • Copy <atlas-conf>/atlas-application.properties to to the sqoop conf directory <sqoop-conf>/
  • +
  • Link <atlas-home>/hook/sqoop/*.jar in sqoop lib
+

Refer Configuration for notification related configurations

+
+

Limitations

+

+
    +
  • Only the following sqoop operations are captured by sqoop hook currently - hiveImport
+
+
+ +
+ + + + http://git-wip-us.apache.org/repos/asf/atlas-website/blob/c5b7bdb3/0.8.1/Configuration.html ---------------------------------------------------------------------- diff --git a/0.8.1/Configuration.html b/0.8.1/Configuration.html new file mode 100644 index 0000000..279da5c --- /dev/null +++ b/0.8.1/Configuration.html @@ -0,0 +1,511 @@ + + + + + + + + + Apache Atlas – Configuring Apache Atlas - Application Properties + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+

Configuring Apache Atlas - Application Properties

+

All configuration in Atlas uses java properties style configuration. The main configuration file is atlas-application.properties which is in the conf dir at the deployed location. It consists of the following sections:

+
+

Graph Configs

+
+

Graph persistence engine

+

This section sets up the graph db - titan - to use a persistence engine. Please refer to link for more details. The example below uses BerkeleyDBJE.

+
+
+atlas.graph.storage.backend=berkeleyje
+atlas.graph.storage.directory=data/berkley
+
+
+
+
Graph persistence engine - Hbase
+

Basic configuration

+
+
+atlas.graph.storage.backend=hbase
+#For standalone mode , specify localhost
+#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
+atlas.graph.storage.hostname=<ZooKeeper Quorum>
+
+
+

HBASE_CONF_DIR environment variable needs to be set to point to the Hbase client configuration directory which is added to classpath when Atlas starts up. hbase-site.xml needs to have the following properties set according to the cluster setup

+
+
+#Set below to /hbase-secure if the Hbase server is setup in secure mode
+zookeeper.znode.parent=/hbase-unsecure
+
+
+

Advanced configuration

+

# If you are planning to use any of the configs mentioned below, they need to be prefixed with "atlas.graph." to take effect in ATLAS Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_hbase

+

Permissions

+

When Atlas is configured with HBase as the storage backend the graph db (titan) needs sufficient user permissions to be able to create and access an HBase table. In a secure cluster it may be necessary to grant permissions to the 'atlas' user for the 'titan' table.

+

With Ranger, a policy can be configured for 'titan'.

+

Without Ranger, HBase shell can be used to set the permissions.

+
+
+   su hbase
+   kinit -k -t <hbase keytab> <hbase principal>
+   echo "grant 'atlas', 'RWXCA', 'titan'" | hbase shell
+
+
+

Note that if the embedded-hbase-solr profile is used then HBase is included in the distribution so that a standalone instance of HBase can be started as the default storage backend for the graph repository. Using the embedded-hbase-solr profile will configure Atlas so that HBase instance will be started and stopped along with the Atlas server by default. To use the embedded-hbase-solr profile please see "Building Atlas" in the Installation Steps section.

+
+

Graph Search Index

+

This section sets up the graph db - titan - to use an search indexing system. The example configuration below sets up to use an embedded Elastic search indexing system.

+
+
+atlas.graph.index.search.backend=elasticsearch
+atlas.graph.index.search.directory=data/es
+atlas.graph.index.search.elasticsearch.client-only=false
+atlas.graph.index.search.elasticsearch.local-mode=true
+atlas.graph.index.search.elasticsearch.create.sleep=2000
+
+
+
+
Graph Search Index - Solr
+

Please note that Solr installation in Cloud mode is a prerequisite before configuring Solr as the search indexing backend. Refer InstallationSteps section for Solr installation/configuration.

+
+
+ atlas.graph.index.search.backend=solr5
+ atlas.graph.index.search.solr.mode=cloud
+ atlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181
+ atlas.graph.index.search.solr.zookeeper-connect-timeout=<SolrCloud Zookeeper Connection Timeout>. Default value is 60000 ms
+ atlas.graph.index.search.solr.zookeeper-session-timeout=<SolrCloud Zookeeper Session Timeout>. Default value is 60000 ms
+
+
+

Also note that if the embedded-hbase-solr profile is used then Solr is included in the distribution so that a standalone instance of Solr can be started as the default search indexing backend. Using the embedded-hbase-solr profile will configure Atlas so that the standalone Solr instance will be started and stopped along with the Atlas server by default. To use the embedded-hbase-solr profile please see "Building Atlas" in the Installation Steps section.

+
+

Choosing between Persistence and Indexing Backends

+

Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/bdb.html and http://s3.thinkaurelius.com/docs/titan/0.5.4/hbase.html for choosing between the persistence backends. BerkeleyDB is suitable for smaller data sets in the range of upto 10 million vertices with ACID gurantees. HBase on the other hand doesnt provide ACID guarantees but is able to scale for larger graphs. HBase also provides HA inherently.

+
+

Choosing between Persistence Backends

+

Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/bdb.html and http://s3.thinkaurelius.com/docs/titan/0.5.4/hbase.html for choosing between the persistence backends. BerkeleyDB is suitable for smaller data sets in the range of upto 10 million vertices with ACID gurantees. HBase on the other hand doesnt provide ACID guarantees but is able to scale for larger graphs. HBase also provides HA inherently.

+
+

Choosing between Indexing Backends

+

Refer http://s3.thinkaurelius.com/docs/titan/0.5.4/elasticsearch.html and http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.html for choosing between ElasticSearch and Solr. Solr in cloud mode is the recommended setup.

+
+

Switching Persistence Backend

+

For switching the storage backend from BerkeleyDB to HBase and vice versa, refer the documentation for "Graph Persistence Engine" described above and restart ATLAS. The data in the indexing backend needs to be cleared else there will be discrepancies between the storage and indexing backend which could result in errors during the search. ElasticSearch runs by default in embedded mode and the data could easily be cleared by deleting the ATLAS_HOME/data/es directory. For Solr, the collections which were created during ATLAS Installation - vertex_index, edge_index, fulltext_index could be deleted which will cleanup the indexes

+
+

Switching Index Backend

+

Switching the Index backend requires clearing the persistence backend data. Otherwise there will be discrepancies between the persistence and index backends since switching the indexing backend means index data will be lost. This leads to "Fulltext" queries not working on the existing data For clearing the data for BerkeleyDB, delete the ATLAS_HOME/data/berkeley directory For clearing the data for HBase, in Hbase shell, run 'disable titan' and 'drop titan'

+
+

Lineage Configs

+

The higher layer services like lineage, schema, etc. are driven by the type system and this section encodes the specific types for the hive data model.

+

# This models reflects the base super types for Data and Process

+
+
+atlas.lineage.hive.table.type.name=DataSet
+atlas.lineage.hive.process.type.name=Process
+atlas.lineage.hive.process.inputs.name=inputs
+atlas.lineage.hive.process.outputs.name=outputs
+
+## Schema
+atlas.lineage.hive.table.schema.query=hive_table where name=?, columns
+
+
+
+

Search Configs

+

Search APIs (DSL and full text search) support pagination and have optional limit and offset arguments. Following configs are related to search pagination

+
+
+# Default limit used when limit is not specified in API
+atlas.search.defaultlimit=100
+
+# Maximum limit allowed in API. Limits maximum results that can be fetched to make sure the atlas server doesn't run out of memory
+atlas.search.maxlimit=10000
+
+
+
+

Notification Configs

+

Refer http://kafka.apache.org/documentation.html#configuration for Kafka configuration. All Kafka configs should be prefixed with 'atlas.kafka.'

+
+
+atlas.notification.embedded=true
+atlas.kafka.data=${sys:atlas.home}/data/kafka
+atlas.kafka.zookeeper.connect=localhost:9026
+atlas.kafka.bootstrap.servers=localhost:9027
+atlas.kafka.zookeeper.session.timeout.ms=400
+atlas.kafka.zookeeper.sync.time.ms=20
+atlas.kafka.auto.commit.interval.ms=1000
+atlas.kafka.hook.group.id=atlas
+
+
+

Note that Kafka group ids are specified for a specific topic. The Kafka group id configuration for entity notifications is 'atlas.kafka.entities.group.id'

+
+
+atlas.kafka.entities.group.id=<consumer id>
+
+
+

These configuration parameters are useful for setting up Kafka topics via Atlas provided scripts, described in the Installation Steps page.

+
+
+# Whether to create the topics automatically, default is true.
+# Comma separated list of topics to be created, default is "ATLAS_HOOK,ATLAS_ENTITES"
+atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
+# Number of replicas for the Atlas topics, default is 1. Increase for higher resilience to Kafka failures.
+atlas.notification.replicas=1
+# Enable the below two properties if Kafka is running in Kerberized mode.
+# Set this to the service principal representing the Kafka service
+atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
+# Set this to the location of the keytab file for Kafka
+#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab
+
+
+

These configuration parameters are useful for saving messages in case there are issues in reaching Kafka for sending messages.

+
+
+# Whether to save messages that failed to be sent to Kafka, default is true
+atlas.notification.log.failed.messages=true
+# If saving messages is enabled, the file name to save them to. This file will be created under the log directory of the hook's host component - like HiveServer2
+atlas.notification.failed.messages.filename=atlas_hook_failed_messages.log
+
+
+
+

Client Configs

+
+
+atlas.client.readTimeoutMSecs=60000
+atlas.client.connectTimeoutMSecs=60000
+atlas.rest.address=<http/https>://<atlas-fqdn>:<atlas port> - default http://localhost:21000
+
+
+
+

Security Properties

+
+

SSL config

+

The following property is used to toggle the SSL feature.

+
+
+atlas.enableTLS=false
+
+
+
+

High Availability Properties

+

The following properties describe High Availability related configuration options:

+
+
+# Set the following property to true, to enable High Availability. Default = false.
+atlas.server.ha.enabled=true
+
+# Define a unique set of strings to identify each instance that should run an Atlas Web Service instance as a comma separated list.
+atlas.server.ids=id1,id2
+# For each string defined above, define the host and port on which Atlas server binds to.
+atlas.server.address.id1=host1.company.com:21000
+atlas.server.address.id2=host2.company.com:31000
+
+# Specify Zookeeper properties needed for HA.
+# Specify the list of services running Zookeeper servers as a comma separated list.
+atlas.server.ha.zookeeper.connect=zk1.company.com:2181,zk2.company.com:2181,zk3.company.com:2181
+# Specify how many times should connection try to be established with a Zookeeper cluster, in case of any connection issues.
+atlas.server.ha.zookeeper.num.retries=3
+# Specify how much time should the server wait before attempting connections to Zookeeper, in case of any connection issues.
+atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
+# Specify how long a session to Zookeeper should last without inactiviy to be deemed as unreachable.
+atlas.server.ha.zookeeper.session.timeout.ms=20000
+
+# Specify the scheme and the identity to be used for setting up ACLs on nodes created in Zookeeper for HA.
+# The format of these options is <scheme>:<identity>. For more information refer to http://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl.
+# The 'acl' option allows to specify a scheme, identity pair to setup an ACL for.
+atlas.server.ha.zookeeper.acl=sasl:client@comany.com
+# The 'auth' option specifies the authentication that should be used for connecting to Zookeeper.
+atlas.server.ha.zookeeper.auth=sasl:client@company.com
+
+# Since Zookeeper is a shared service that is typically used by many components,
+# it is preferable for each component to set its znodes under a namespace.
+# Specify the namespace under which the znodes should be written. Default = /apache_atlas
+atlas.server.ha.zookeeper.zkroot=/apache_atlas
+
+# Specify number of times a client should retry with an instance before selecting another active instance, or failing an operation.
+atlas.client.ha.retries=4
+# Specify interval between retries for a client.
+atlas.client.ha.sleep.interval.ms=5000
+
+
+
+

Server Properties

+
+
+# Set the following property to true, to enable the setup steps to run on each server start. Default = false.
+atlas.server.run.setup.on.start=false
+
+
+
+

Performance configuration items

+

The following properties can be used to tune performance of Atlas under specific circumstances:

+
+
+# The number of times Atlas code tries to acquire a lock (to ensure consistency) while committing a transaction.
+# This should be related to the amount of concurrency expected to be supported by the server. For e.g. with retries set to 10, upto 100 threads can concurrently create types in the Atlas system.
+# If this is set to a low value (default is 3), concurrent operations might fail with a PermanentLockingException.
+atlas.graph.storage.lock.retries=10
+
+# Milliseconds to wait before evicting a cached entry. This should be > atlas.graph.storage.lock.wait-time x atlas.graph.storage.lock.retries
+# If this is set to a low value (default is 10000), warnings on transactions taking too long will occur in the Atlas application log.
+atlas.graph.storage.cache.db-cache-time=120000
+
+# Minimum number of threads in the atlas web server
+atlas.webserver.minthreads=10
+
+# Maximum number of threads in the atlas web server
+atlas.webserver.maxthreads=100
+
+# Keepalive time in secs for the thread pool of the atlas web server
+atlas.webserver.keepalivetimesecs=60
+
+# Queue size for the requests(when max threads are busy) for the atlas web server
+atlas.webserver.queuesize=100
+
+
+
+

Recording performance metrics

+

Atlas package should be built with '-P perf' to instrument atlas code to collect metrics. The metrics will be recorded in <atlas.log.dir>/metric.log, with one log line per API call. The metrics contain the number of times the instrumented methods are called and the total time spent in the instrumented method. Logging to metric.log is controlled through log4j configuration in atlas-log4j.xml. When the atlas code is instrumented, to disable logging to metric.log at runtime, set log level of METRICS logger to info level:

+
+
+<logger name="METRICS" additivity="false">
+    <level value="info"/>
+    <appender-ref ref="METRICS"/>
+</logger>
+
+
+
+
+ +
+ + + + http://git-wip-us.apache.org/repos/asf/atlas-website/blob/c5b7bdb3/0.8.1/EclipseSetup.html ---------------------------------------------------------------------- diff --git a/0.8.1/EclipseSetup.html b/0.8.1/EclipseSetup.html new file mode 100644 index 0000000..62c6a1a --- /dev/null +++ b/0.8.1/EclipseSetup.html @@ -0,0 +1,339 @@ + + + + + + + + + Apache Atlas – Tools required to build and run Apache Atlas on Eclipse + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+

Tools required to build and run Apache Atlas on Eclipse

+

These instructions are provided as-is. They worked at a point in time; other variants of software may work. These instructions may become stale if the build dependencies change.

+

They have been shown to work on 19th of December 2016.

+

To build, run tests, and debug Apache Atlas, the following software is required:

+

Java

+
    +
  • Download and install a 1.8 Java SDK
  • +
  • Set JAVA_HOME system environment variable to the installed JDK home directory
  • +
  • Add JAVA_HOME/bin directory to system PATH
Python +

Atlas command line tools are written in Python.

+
    +
  • Download and install Python version 2.7.7
  • +
  • For Mac, we used 2.7.11
  • +
  • Add Python home directory to system PATH
Maven +
    +
  • Download and install Maven 3.3.9
  • +
  • Set the environment variable M2_HOME to point to the maven install directory
  • +
  • Add M2_HOME/bin directory to system PATH e.g. C:\Users\IBM_ADMIN\Documents\Software\apache-maven-3.3.9\bin
Git +
    +
  • Install Git
  • +
  • Add git bin directory to the system PATH e.g. C:\Program Files (x86)\Git\bin
Eclipse +
    +
  • Install Eclipse Neon (4.6)
  • +
  • The non-EE Neon for iOS from eclipse.org has been proven to work here.
  • +
  • Install the Scala IDE, TestNG, and m2eclipse-scala features/plugins as described below.
Scala IDE Eclipse feature +

Some of the Atlas source code is written in the Scala programming language. The Scala IDE feature is required to compile Scala source code in Eclipse.

+
    +
  • In Eclipse, choose Help - Install New Software..
  • +
  • Click Add... to add an update site, and set Location to http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site
  • +
  • Select Scala IDE for Eclipse from the list of available features
  • +
  • Restart Eclipse after install
  • +
  • Set the Scala compiler to target the 1.7 JVM: Window - Preferences - Scala - Compiler, change target to 1.7
TestNG Eclipse plug-in +

Atlas tests use the TestNG framework, which is similar to JUnit. The TestNG plug-in is required to run TestNG tests from Eclipse.

+
    +
  • In Eclipse, choose Help - Install New Software..
  • +
  • Click Add... to add an update site, and set Location to http://beust.com/eclipse-old/eclipse_6.9.9.201510270734 +
      +
    • Choose TestNG and continue with install
    • +
    • Restart Eclipse after installing the plugin
    • +
    • In Window - Preferences - TestNG, uncheck "Use project TestNG jar"
m2eclipse-scala Eclipse plugin +
    +
  • In Eclipse, choose Help - Install New Software..
  • +
  • Click Add... to add an update site, and set Location to http://alchim31.free.fr/m2e-scala/update-site/
  • +
  • Choose Maven Integration for Scala IDE, and continue with install
  • +
  • Restart Eclipse after install
  • +
  • In Window - Preferences -Maven - Errors/Warnings, set Plugin execution not covered by lifecycle configuration to Warning
Import Atlas maven projects into Eclipse: +

a. File - Import - Maven - Existing Maven Projects b. Browse to your Atlas folder c. Uncheck the root project and non-Java projects such as dashboardv2, docs and distro, then click Finish

+

On the Mac, the Maven import fails with message

+
+
+"Cannot complete the install because one or more required items could not be found. Software being installed: Maven Integration for AJDT (Optional) 0.14.0.201506231302 (org.maven.ide.eclipse.ajdt.feature.feature.group 0.14.0.201506231302) Missing requirement: Maven Integration for AJDT (Optional) 0.14.0.201506231302 (org.maven.ide.eclipse.ajdt.feature.feature.group 0.14.0.201506231302) requires 'org.eclipse.ajdt.core 1.5.0' but it could not be found".
+
+
+

Install http://download.eclipse.org/tools/ajdt/46/dev/update and rerun. The Maven AspectJ should plugin install - allowing the references to Aspects in Maven to be resolved.

+

d. In the atlas-typesystem, atlas-repository, hdfs-model, and storm-bridge projects, add the src/main/scala and src/test/scala (if available) directories as source folders. Note: the hdfs-model and storm-bridge projects do not have the src/test/scala folder.

+

Right-click on the project, and choose Properties.

+

Click the Java Build Path in the left-hand panel, and choose the Source tab.

+

Click Add Folder, and select the src/main/scala and src/test/scala directories.

+

Only the atlas-repository and atlas-type system projects have Scala source folders to update.

+

e. Select atlas-typesystem, atlas-repository, hdfs-model, and storm-bridge projects, right-click, go to the Scala menu, and choose ‘Set the Scala Installation’.

+

f. Choose Fixed Scala Installation: 2.11.8 (bundled) , and click OK.

+

g. Restart Eclipse

+

h. Choose Project - Clean, select Clean all projects, and click OK.

+

Some projects may not pick up the Scala library – if this occurs, quick fix on those projects to add in the Scala library – projects atlas-typesystem, atlas-repository, hdfs-model, storm-bridge and altas-webapp.

+

You should now have a clean workspace.

+

Sample Bash scripts to help mac users

+

You will need to change some of these scripts to point to your installation targets.

+
    +
  • Run this script to setup your command line build environment
+
+
+#!/bin/bash # export JAVA_HOME=/Library/Java/JavaVirtualMachines/macosxx6480sr3fp10hybrid-20160719_01-sdk
+export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home
+export M2_HOME=/Applications/apache-maven-3.3.9 # Git is installed in the system path
+export PYTHON_HOME='/Applications/Python 2.7'
+export PATH=$PYTHON_HOME:$M2_HOME/bin:$JAVA_HOME/bin:$PATH
+export MAVEN_OPTS="-Xmx1536m -Drat.numUnapprovedLicenses=100 -XX:MaxPermSize=256m"
+
+
+

+
    +
  • If you do not want to set Java 8 as your system java, you can use this bash script to setup the environment and run Eclipse (which you can drop in Applications and rename to neon).
+
+
+#!/bin/bash
+export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home
+export M2_HOME=/Applications/apache-maven-3.3.9
+# Git is installed in the system path
+export PYTHON_HOME='/Applications/Python 2.7'
+export PATH=$PYTHON_HOME:$M2_HOME/bin:$JAVA_HOME/bin:$PATH/Applications/neon.app/Contents/MacOS/eclipse
+
+
+
+
+ +
+ + + + http://git-wip-us.apache.org/repos/asf/atlas-website/blob/c5b7bdb3/0.8.1/Export-API.html ---------------------------------------------------------------------- diff --git a/0.8.1/Export-API.html b/0.8.1/Export-API.html new file mode 100644 index 0000000..5ab2884 --- /dev/null +++ b/0.8.1/Export-API.html @@ -0,0 +1,420 @@ + + + + + + + + + Apache Atlas – Export API + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+

Export API

+

The general approach is:

+
    +
  • Consumer specifies the scope of data to be exported (details below).
  • +
  • The API if successful, will return the stream in the format specified.
  • +
  • Error will be returned on failure of the call.
+

See here for details on exporting hdfs_path entities.

+

+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
TitleExport API
ExampleSee Examples sections below.
URLapi/atlas/admin/export
MethodPOST
URL ParametersNone
Data ParametersThe class AtlasExportRequest is used to specify the items to export. The list of AtlasObjectId(s) allow for specifying the multiple items to export in a session. The AtlasObjectId is a tuple of entity type, name of unique attribute, value of unique attribute. Several items can be specified. See examples below.
Success ResponseFile stream as application/zip.
Error ResponseErrors that are handled within the system will be returned as AtlasBaseException.
NotesConsumer could choose to consume the output of the API by programmatically using java.io.ByteOutputStream or by manually, save the contents of the stream to a file on the disk.
Method Signature +
+
+@POST
+@Path("/export")
+@Consumes("application/json;charset=UTF-8")
+
+
+
+

Additional Options

+

It is possible to specify additional parameters for the Export operation.

+

Current implementation has 2 options. Both are optional:

+
    +
  • matchType This option configures the approach used for fetching the starting entity. It has follow values: +
      +
    • startsWith Search for an entity that is prefixed with the specified criteria.
    • +
    • endsWith Search for an entity that is suffixed with the specified criteria.
    • +
    • contains Search for an entity that has the specified criteria as a sub-string.
    • +
    • matches Search for an entity that is a regular expression match with the specified criteria.
+

+
    +
  • fetchType This option configures the approach used for fetching entities. It has following values: +
      +
    • FULL: This fetches all the entities that are connected directly and indirectly to the starting entity. E.g. If a starting entity specified is a table, then this option will fetch the table, database and all the other tables within the database.
    • +
    • CONNECTED: This fetches all the etnties that are connected directly to the starting entity. E.g. If a starting entity specified is a table, then this option will fetch the table and the database entity only.
+

If no matchType is specified, exact match is used. Which means, that the entire string is used in the search criteria.

+

Searching using matchType applies for all types of entities. It is particularly useful for matching entities of type hdfs_path (see here).

+

The fetchType option defaults to FULL.

+

For complete example see section below.

+
+

Contents of Exported ZIP File

+

The exported ZIP file has the following entries within it:

+
    +
  • atlas-export-result.json: +
      +
    • Input filters: The scope of export.
    • +
    • File format: The format chosen for the export operation.
    • +
    • Metrics: The number of entity definitions, classifications and entities exported.
  • +
  • atlas-typesdef.json: Type definitions for the entities exported.
  • +
  • atlas-export-order.json: Order in which entities should be exported.
  • +
  • {guid}.json: Individual entities are exported with file names that correspond to their id.
+
+

Examples

+

The AtlasExportRequest below shows filters that attempt to export 2 databases in cluster cl1:

+
+
+{
+    "itemsToExport": [
+       { "typeName": "hive_db", "uniqueAttributes": { "qualifiedName": "accounts@cl1" } },
+       { "typeName": "hive_db", "uniqueAttributes": { "qualifiedName": "hr@cl1" } }
+    ]
+}
+
+
+

The AtlasExportRequest below specifies the fetchType as FULL. The matchType option will fetch accounts@cl1.

+
+
+{
+    "itemsToExport": [
+       { "typeName": "hive_db", "uniqueAttributes": { "qualifiedName": "accounts@" } },
+    ],
+    "options" {
+        "fetchType": "FULL",
+        "matchType": "startsWith"
+    }
+}
+
+
+

The AtlasExportRequest below specifies the fetchType as connected. The matchType option will fetch accountsReceivable, accountsPayable, etc present in the database.

+
+
+{
+    "itemsToExport": [
+       { "typeName": "hive_db", "uniqueAttributes": { "name": "accounts" } },
+    ],
+    "options" {
+        "fetchType": "CONNECTED",
+        "matchType": "startsWith"
+    }
+}
+
+
+

Below is the AtlasExportResult JSON for the export of the Sales DB present in the QuickStart.

+

The metrics contains the number of types and entities exported as part of the operation.

+
+
+{
+    "clientIpAddress": "10.0.2.15",
+    "hostName": "10.0.2.2",
+    "metrics": {
+        "duration": 1415,
+        "entitiesWithExtInfo": 12,
+        "entity:DB_v1": 2,
+        "entity:LoadProcess_v1": 2,
+        "entity:Table_v1": 6,
+        "entity:View_v1": 2,
+        "typedef:Column_v1": 1,
+        "typedef:DB_v1": 1,
+        "typedef:LoadProcess_v1": 1,
+        "typedef:StorageDesc_v1": 1,
+        "typedef:Table_v1": 1,
+        "typedef:View_v1": 1,
+        "typedef:classification": 6
+    },
+    "operationStatus": "SUCCESS",
+    "request": {
+        "itemsToExport": [
+            {
+                "typeName": "DB_v1",
+                "uniqueAttributes": {
+                    "name": "Sales"
+                }
+            }
+        ],
+        "options": {
+            "fetchType": "full"
+        }
+    },
+    "userName": "admin"
+}
+
+
+
+

CURL Calls

+

Below are sample CURL calls that demonstrate Export of QuickStart database.

+
+
+curl -X POST -u adminuser:password -H "Content-Type: application/json" -H "Cache-Control: no-cache" -d '{
+    "itemsToExport": [
+            { "typeName": "DB", "uniqueAttributes": { "name": "Sales" }
+            { "typeName": "DB", "uniqueAttributes": { "name": "Reporting" }
+            { "typeName": "DB", "uniqueAttributes": { "name": "Logging" }
+        }
+    ],
+    "options": "full"
+}' "http://localhost:21000/api/atlas/admin/export" > quickStartDB.zip
+
+
+
+
+ +
+ + + + http://git-wip-us.apache.org/repos/asf/atlas-website/blob/c5b7bdb3/0.8.1/Export-HDFS-API.html ---------------------------------------------------------------------- diff --git a/0.8.1/Export-HDFS-API.html b/0.8.1/Export-HDFS-API.html new file mode 100644 index 0000000..0cca7ac --- /dev/null +++ b/0.8.1/Export-HDFS-API.html @@ -0,0 +1,275 @@ + + + + + + + + + Apache Atlas – Export & Import APIs for HDFS Path + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+

Export & Import APIs for HDFS Path

+
+

Introduction

+

The general approach for using the Import-Export APIs for HDFS Paths remain the same. There are minor variations caused how HDFS paths are handled within Atlas.

+

Unlike HIVE entities, HDFS entities within Atlas are created manually using the Create Entity link within the Atlas Web UI.

+

Also, HDFS paths tend to be hierarchical, in the sense that users tend to model the same HDFS storage structure within Atlas.

+

Sample HDFS Setup

+

HDFS Path Atlas Entity
/apps/warehouse/finance Entity type: hdfs_path
Name: Finance
QualifiedName: FinanceAll
/apps/warehouse/finance/accounts-receivable Entity type: hdfs_path
Name: FinanceReceivable
QualifiedName: FinanceReceivable
Path: /apps/warehouse/finance
/apps/wareho use/finance/accounts-payable Entity type: hdfs_path
Name: Finance-Payable
QualifiedName: FinancePayable
Path: /apps/warehouse/finance/accounts-payable
/apps/warehouse/finance/billing Entity type: hdfs_path
Name: FinanceBilling
QualifiedName: FinanceBilling
Path: /apps/warehouse/finance/billing

+
+

Export API Using matchType

+

To export entities that represent HDFS path, use the Export API using the matchType option. Details can be found here.

+
+

Example Using CURL Calls

+

Below are sample CURL calls that performs export operation on the Sample HDFS Setup shown above.

+
+
+curl -X POST -u adminuser:password -H "Content-Type: application/json" -H "Cache-Control: no-cache" -d '{
+    "itemsToExport": [
+            { "typeName": "hdfs_path", "uniqueAttributes": { "name": "FinanceAll" }
+        }
+    ],
+    "options": {
+     "fetchType": "full",
+     "matchType": "startsWith"
+    }
+}' "http://localhost:21000/api/atlas/admin/export" > financeAll.zip
+
+
+
+
+ +
+ + + +