hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
Date Tue, 08 Jan 2013 06:16:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546643#comment-13546643
] 

Phabricator commented on HIVE-3562:
-----------------------------------

njain has commented on the revision "HIVE-3562 [jira] Some limit can be pushed down to map
stage".

INLINE COMMENTS
  conf/hive-default.xml.template:1434 Can you add more details here - a example query would
really help ?
  ql/src/test/queries/clientpositive/limit_pushdown.q:16 What is so special about 40 ?

  set hive.limit.pushdown.heap.threshold explicitly at the beginning of the test, makes the
  test easier to maintain in the long run.

  ql/src/test/queries/clientpositive/limit_pushdown.q:34 What is the difference between this
and line 3 ?

  ql/src/test/queries/clientpositive/limit_pushdown.q:10 I think this plan is not correct.

  Let us say, the values are
  v1
  v2
  ..
  v10
  v11
  v12
  ..
  v20

  The first mapper does not have v8-10, so it emits v1-v7, v11-v13
  The second mapper contains data for all values, but it only emits v1-v10

  Since it does not involves a order by, it is possible that the data for v11 will get picked
up, which does not contain data from the second mapper. If you are pushing the limit up, you
should create an additional MR job which orders the rows - in the above example, making sure
that only v1-v10 are picked up.

  Am I missing something here ?

REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, navis
Cc: njain

                
> Some limit can be pushed down to map stage
> ------------------------------------------
>
>                 Key: HIVE-3562
>                 URL: https://issues.apache.org/jira/browse/HIVE-3562
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>         Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree, 
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message