hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Commented: (HIVE-279) Implement predicate push down for hive queries
Date Fri, 27 Mar 2009 21:52:50 GMT


Namit Jain commented on HIVE-279:

Add comments in tests:

For eg:  ppd_gby.q : the src1.c1 > 'val_200' is pushed up, but the other is not etc.

More tests needed:

groupby followed by groupby
groupby followed by join 
3-way join not being merged
3-way join being merged
outer join various scenarios

add a test for multi-table insert also, where ppd is not happening.

rand() being undeterministic - I think that change has already been merged by Raghu line 45: pusehed -> pushed

columnPruner should be done after ppd, since it regenerates the operator tree.
Can you add a test for that ? I think ppd will not happen - need to confirm via a test

It might be a debugging nightmare - can you add a LOG trace/info in OpProcFactory 
(minimally in TableScanPPD - ideally everywhere.

In SemanticAnalyzer: the colPosMap is not maintained in genReduceSinkPlan : 
although the RR does not change, it might be a good idea to add a test for the same.
ppd after cluster by

        if(exp == null) {
          ctx.setIsCandidate(colref, false);
          return false;

I am assuming exp can be null only because colExprMap is not maintained in some cases
(for eg: group by exprs.)

Is that true ? 
If yes, Can you add a comment for the same ?
If no, can you explain that ?

83:        ctx.setIsCandidate(colref, true);

112: cant u break out of the loop if isCandidate is false


128: the order of parents of children of tablescan can be lost, change parent at that position

247:       if(aliases.size() == 1 && aliases.contains("")) {
        // Reduce sink of group by operator
        aliases = null; 

Instead of this, do you want to add a parameter to mergeWithChildrenPred() -- allAliasesOk

null and empty aliases are differentiated in mergeWi..() in a bizarre way, it might be easier
to understand with a seperate parameter

Some cleanup:

JoinOperator: posToAliasMap --> cant it me moved to ParseContext instead ?
same for colExprMap -> or it can be moved to OpParseContext ?
They are all parse time structures.

> Implement predicate push down for hive queries
> ----------------------------------------------
>                 Key: HIVE-279
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.patch
> Push predicates that are expressed in outer queries into inner queries where possible
so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all
the rows have to be passed from mapper to reducer. by pushing the filter predicate to the
mapper, query performance should improve.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message