atlas-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shweth...@apache.org
Subject [1/2] incubator-atlas git commit: ATLAS-1182 Hive Column level lineage docs (svimal2106 via shwethags)
Date Wed, 19 Oct 2016 10:04:23 GMT
Repository: incubator-atlas
Updated Branches:
  refs/heads/master eb6e656be -> b6acff6d5


ATLAS-1182 Hive Column level lineage docs (svimal2106 via shwethags)


Project: http://git-wip-us.apache.org/repos/asf/incubator-atlas/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-atlas/commit/3cc1bd5b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-atlas/tree/3cc1bd5b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-atlas/diff/3cc1bd5b

Branch: refs/heads/master
Commit: 3cc1bd5b837c944c8af1d50451ef89a1d3f15ee6
Parents: eb6e656
Author: Shwetha GS <sshivalingamurthy@hortonworks.com>
Authored: Wed Oct 19 15:21:51 2016 +0530
Committer: Shwetha GS <sshivalingamurthy@hortonworks.com>
Committed: Wed Oct 19 15:21:51 2016 +0530

----------------------------------------------------------------------
 .../resources/images/column_lineage_ex1.png     | Bin 0 -> 34057 bytes
 docs/src/site/twiki/Bridge-Hive.twiki           |  37 +++++++++++++++++++
 release-log.txt                                 |   1 +
 3 files changed, 38 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/3cc1bd5b/docs/src/site/resources/images/column_lineage_ex1.png
----------------------------------------------------------------------
diff --git a/docs/src/site/resources/images/column_lineage_ex1.png b/docs/src/site/resources/images/column_lineage_ex1.png
new file mode 100644
index 0000000..a41c5fb
Binary files /dev/null and b/docs/src/site/resources/images/column_lineage_ex1.png differ

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/3cc1bd5b/docs/src/site/twiki/Bridge-Hive.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Bridge-Hive.twiki b/docs/src/site/twiki/Bridge-Hive.twiki
index 653ed4e..dd22b5c 100644
--- a/docs/src/site/twiki/Bridge-Hive.twiki
+++ b/docs/src/site/twiki/Bridge-Hive.twiki
@@ -71,6 +71,43 @@ The following properties in <atlas-conf>/atlas-application.properties
control th
 
 Refer [[Configuration][Configuration]] for notification related configurations
 
+---++ Column Level Lineage
+
+Starting from 0.8-incubating version of Atlas, Column level lineage is captured in Atlas.
Below are the details
+
+---+++ Model
+   * !ColumnLineageProcess type is a subclass of Process
+
+   * This relates an output Column to a set of input Columns or the Input Table
+
+   * The Lineage also captures the kind of Dependency: currently the values are SIMPLE, EXPRESSION,
SCRIPT
+      * A SIMPLE dependency means the output column has the same value as the input
+      * An EXPRESSION dependency means the output column is transformed by some expression
in the runtime(for e.g. a Hive SQL expression) on the Input Columns.
+      * SCRIPT means that the output column is transformed by a user provided script.
+
+   * In case of EXPRESSION dependency the expression attribute contains the expression in
string form
+
+   * Since Process links input and output !DataSets, we make Column a subclass of !DataSet
+
+---+++ Examples
+For a simple CTAS below:
+<verbatim>
+create table t2 as select id, name from T1
+</verbatim>
+
+The lineage is captured as
+
+<img src="images/column_lineage_ex1.png" height="200" width="400" />
+
+
+
+---+++ Extracting Lineage from Hive commands
+  * The !HiveHook maps the !LineageInfo in the !HookContext to Column lineage instances
+
+  * The !LineageInfo in Hive provides column-level lineage for the final !FileSinkOperator,
linking them to the input columns in the Hive Query
+
+---+++ NOTE
+Column level lineage works with Hive version 1.2.1 after the patch for <a href="https://issues.apache.org/jira/browse/HIVE-13112">HIVE-13112</a>
is applied to Hive source
 
 ---++ Limitations
    * Since database name, table name and column names are case insensitive in hive, the corresponding
names in entities are lowercase. So, any search APIs should use lowercase while querying on
the entity names

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/3cc1bd5b/release-log.txt
----------------------------------------------------------------------
diff --git a/release-log.txt b/release-log.txt
index d2b848b..3f24063 100644
--- a/release-log.txt
+++ b/release-log.txt
@@ -9,6 +9,7 @@ ATLAS-1060 Add composite indexes for exact match performance improvements
for al
 ATLAS-1127 Modify creation and modification timestamps to Date instead of Long(sumasai)
 
 ALL CHANGES:
+ATLAS-1182 Hive Column level lineage docs (svimal2106 via shwethags)
 ATLAS-1230 updated AtlasTypeRegistry to support batch, atomic type updates (mneethiraj)
 ATLAS-1229 Add TypeCategory and methods to access attribute definitiions in AtlasTypes (sumasai)
 ATLAS-1227 Added support for attribute constraints in the API (mneethiraj)


Mime
View raw message