hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
Date Mon, 26 Aug 2013 18:51:54 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750398#comment-13750398
] 

Phabricator commented on HIVE-3562:
-----------------------------------

ashutoshc has commented on the revision "HIVE-3562 [jira] Some limit can be pushed down to
map stage".

  Looks pretty good. Just requesting to add few more comments.

INLINE COMMENTS
  conf/hive-default.xml.template:1586-1590 We can remove this now.
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:1186 Just to add more clarity,
say something like: we can push the limit above GBY (running in Reducer), since that will
generate single row for each group. This doesn't necessarily hold for GBY (running in Mappers),
so we don't push limit above it.
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:182 It will be good to
add comment about what this field is holding. Add a comment saying: This two dimensional array
holds key data and a corresponding Union object which contains the tag identifying the aggregate
expression for distinct columns.

  Ideally, instead of this 2-D array, we should have probably enhanced HiveKey class for this
logic. We should do that in a follow-up jira.
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:267 I didnt follow this
logic completely.
  Seems like this is an optimization not to evaluate union object repeatedly and do system
copy for it. Can you add a comment explaining this?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:271 seems like it will
be null only for all i = 0. If so, better do if (i==0) check ? Also add comment when this
will be null and when it will be non-null?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:260 You made changes
to this section, because you found bug or are you purely refactoring this? If you hit upon
the bug, can you explain what was it?
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:50-51 It will be good to add comment
about what these 2D arrays are holding?
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:52 Also, add comment saying this
array holds hashcode for keys.

  Also, add note that indices of all these arrays must line up.
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:82 Nice Comments!
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:34 It will be good to add a javadoc
for this class.
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:36 Also javadoc for this interface.

REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, ashutoshc, navis
Cc: njain

                
> Some limit can be pushed down to map stage
> ------------------------------------------
>
>                 Key: HIVE-3562
>                 URL: https://issues.apache.org/jira/browse/HIVE-3562
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>         Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch,
HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch,
HIVE-3562.D5967.8.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree, 
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message