hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <>
Subject [jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks
Date Wed, 31 Mar 2010 04:15:27 GMT


Zheng Shao commented on HIVE-1131:

> Look at the DataContainer class. That has a partition in it. And the Dependency has a
mapping from Partition to the dependencies. Can you explain more your concerns on inefficiency?

I see. So the DataContainer captures the output partition information, but we don't have input
partition information (BaseColumnInfo/TableAliasInfo). This is reasonable since the input
can be lots of partitions.

> For S6 actually the queryplan is the wrong place to store the lineageinfo. Because of
the dynamic partitioning work that Ning is doing, I have to generate the partition to dependency
mapping at run time. So I would rather store it in a run time structure as opposed to a compile
time structure. SessionState fits that bill, though I think we should have another structure
called ExecutionCtx for this. But otherwise I think we want to store this in a runtime structure.

+1 on the ExecutionCtx idea. SessionState is at the session level, and LineageInfo is at the
query level. It will be great to put LineageInfo into ExecutionCtx.

> Add column lineage information to the pre execution hooks
> ---------------------------------------------------------
>                 Key: HIVE-1131
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>         Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch
> We need a mechanism to pass the lineage information of the various columns of a table
to a pre execution hook so that applications can use that for:
> - auditing
> - dependency checking
> and many other applications.
> The proposal is to expose this through a bunch of classes to the pre execution hook interface
to the clients and put in the necessary transformation logic in the optimizer to generate
this information.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message