hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks
Date Sat, 20 Feb 2010 02:15:28 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836103#action_12836103
] 

Zheng Shao commented on HIVE-1131:
----------------------------------

S1. Can we make lineage partition-level instead of table-level?
S2. We might want to define formally the concepts of these levels, especially how they are
composited (What will be UDAF of UDF, or UDF of UDAF, like round(sum(col)), or sum(round(col)))
{code}
+  /**
+   * Enum to track dependency. This enum has two values:
+   * 1. SCALAR - Indicates that the column is derived from a scalar expression.
+   * 2. AGGREGATION - Indicates that the column is derived from an aggregation.
+   */
+  public static enum DependencyType {
+    SIMPLE, UDF, UDAF, UDTF, SCRIPT, SET
+  }
+  
{code}

S3. Use "{}" even for single statement in "if", "for" etc.
S4. Use "ArrayList" instead of "Vector" when it's accessed by a single thread.
S5. Remove "private HashMap<FileSinkOperator, Table> fopToTable;" since it's not used.


> Add column lineage information to the pre execution hooks
> ---------------------------------------------------------
>
>                 Key: HIVE-1131
>                 URL: https://issues.apache.org/jira/browse/HIVE-1131
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>         Attachments: HIVE-1131.patch
>
>
> We need a mechanism to pass the lineage information of the various columns of a table
to a pre execution hook so that applications can use that for:
> - auditing
> - dependency checking
> and many other applications.
> The proposal is to expose this through a bunch of classes to the pre execution hook interface
to the clients and put in the necessary transformation logic in the optimizer to generate
this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message