Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA229E830 for ; Tue, 8 Jan 2013 06:16:16 +0000 (UTC) Received: (qmail 29919 invoked by uid 500); 8 Jan 2013 06:16:16 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 29854 invoked by uid 500); 8 Jan 2013 06:16:16 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 29820 invoked by uid 500); 8 Jan 2013 06:16:15 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 29799 invoked by uid 99); 8 Jan 2013 06:16:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2013 06:16:14 +0000 Date: Tue, 8 Jan 2013 06:16:14 +0000 (UTC) From: "Phabricator (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546643#comment-13546643 ] Phabricator commented on HIVE-3562: ----------------------------------- njain has commented on the revision "HIVE-3562 [jira] Some limit can be pushed down to map stage". INLINE COMMENTS conf/hive-default.xml.template:1434 Can you add more details here - a example query would really help ? ql/src/test/queries/clientpositive/limit_pushdown.q:16 What is so special about 40 ? set hive.limit.pushdown.heap.threshold explicitly at the beginning of the test, makes the test easier to maintain in the long run. ql/src/test/queries/clientpositive/limit_pushdown.q:34 What is the difference between this and line 3 ? ql/src/test/queries/clientpositive/limit_pushdown.q:10 I think this plan is not correct. Let us say, the values are v1 v2 .. v10 v11 v12 .. v20 The first mapper does not have v8-10, so it emits v1-v7, v11-v13 The second mapper contains data for all values, but it only emits v1-v10 Since it does not involves a order by, it is possible that the data for v11 will get picked up, which does not contain data from the second mapper. If you are pushing the limit up, you should create an additional MR job which orders the rows - in the above example, making sure that only v1-v10 are picked up. Am I missing something here ? REVISION DETAIL https://reviews.facebook.net/D5967 To: JIRA, tarball, navis Cc: njain > Some limit can be pushed down to map stage > ------------------------------------------ > > Key: HIVE-3562 > URL: https://issues.apache.org/jira/browse/HIVE-3562 > Project: Hive > Issue Type: Bug > Reporter: Navis > Assignee: Navis > Priority: Trivial > Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch > > > Queries with limit clause (with reasonable number), for example > {noformat} > select * from src order by key limit 10; > {noformat} > makes operator tree, > TS-SEL-RS-EXT-LIMIT-FS > But LIMIT can be partially calculated in RS, reducing size of shuffling. > TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira