hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raajay <>
Subject Semantic Analysis Run Through
Date Thu, 30 Jul 2015 17:27:47 GMT

I am currently playing around with Hive Semantic Analysis code, to
understand how DAGs or Map Reduce plans are generated from Abstract Syntax
Trees. The idea is to explore various possible DAGs and compare their
performance based on execution run time.

The function "analyzeInternal" seems to be handling the entire the plan
generation process. The different steps (at a high level) as described in
the comment section are:

1. Get Resolved Parse Tree from Syntax Tree

2. Get OP tree (Operator tree?) from Resolved parse tree

3. Deduce Result Set schema

4. Generate Parse Context

5. Do View creation

6. Collect Table Access stats

7. Perform Logical Optimization

8. Get Column Access Stats

9. Optimize Physical OP tree.

10. Translate to target execution engine.

I understand that step 7 (Logical Optimization) applies multiple transforms
( e.g. Join Reordering, Constant Propagation, Predicate pushdown) to alter
the AST and thus, different DAGs can be obtained by choosing whether to
apply or not apply certain transformations.

Can changes to the code in Steps 1-2 and 9 also possibly affect the
resulting DAGs ? How does the AST get affected in these steps ? Any
pointers / explanations will be helpful.


View raw message