hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-746) constant folding
Date Sat, 15 Aug 2009 01:40:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743517#action_12743517
] 

Zheng Shao commented on HIVE-746:
---------------------------------

We can do this in the optimization phase (before column pruning).
This will be done together with HIVE-757.

We do a visit from the root operators. We visit one operator only if all its parents are visited.

For each expression tree in an operator, we will try to see if we can pre-compute part or
all of the expression tree, by doing a bottom-up calculation.
A leaf node is a constant, if it's a constant node, or it's referencing a column from its
parent that is constant. We directly fill in the constant value if it's the latter case.
A non-leaf node is a constant, if all of its children are constant (ok if no children at all),
and the node is deterministic (all except non-deterministic udf/genericudf).
We fold all non-leaf node into a single constant node.

After constant folding is done, column pruning should be able to prune out those constant
columns in the intermediate operators.


> constant folding
> ----------------
>
>                 Key: HIVE-746
>                 URL: https://issues.apache.org/jira/browse/HIVE-746
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>
> The constants are not folded at compile time:
> for eg:
> select 1+2 from src
> will evaluate 1+2 for every row.
> This becomes more interesting for scenarios like:
> select unix_timestamp() from src;
> The UDF should be evaluated only once, and the same value should be returned. However,
currently, we mark it as non-deterministic and evaluate it for every row.
> This can have bad side-effects on partition pruning etc.
> In MySQL, the same value is generated independent of the time taken for the query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message