hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-279) Implement predicate push down for hive queries
Date Fri, 27 Feb 2009 21:49:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677548#action_12677548
] 

Namit Jain commented on HIVE-279:
---------------------------------

Some high level comments:

1. Add more comments everywhere, specifically in joinPPD (OpProcFactory)
2. Remove operator specific code in ExprWalkerProcFactory: ColumnExprProcessor: process
3. Use specific data-structures where-ever possible instead of using more generic data-structures.

ExprWalkerInfo:

  private Map<String, List<Node>> pushdownPreds;
  private Map<Node, ExprInfo> exprInfoMap;

In both of them, Node means exprNodeDesc, why dont we use that instead ?

Simlarly, in OpWalkerInfo:

  private Map<Node, ExprWalkerInfo> opToPrunedPredsMap;
  private Map<Operator<? extends Serializable>, OpParseContext> opToParseCtxMap;

use Operator instead of Node in opToPrunedPredsMap

4. Can you move OpWalker and ExprWalker in different directories ?
5. Why are filters only pushed on top of TableScan - cant it be done anywhere. - If you want
to do so in a follow-up, can you file a JIRA for that ?
6. No apache header in many files (ppd directory)


SemanticAnalyzer.java:

A comment explaining the reason for existence of colInfoMap will help. Give an example: group
by 
where the table column order is different from the grouped column order.

Same for posAliasMap, nameToInputColumnInfoMap for JOIN

genJoinOperatorChildren:


      if(aliases == null) {
        aliases = new HashSet<String>();
        posToAliasMap.put(pos, aliases);
      }

isn't the IF redundant ?




> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible
so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all
the rows have to be passed from mapper to reducer. by pushing the filter predicate to the
mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message