Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 367C510685 for ; Fri, 23 Aug 2013 16:18:57 +0000 (UTC) Received: (qmail 24521 invoked by uid 500); 23 Aug 2013 16:18:56 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 24437 invoked by uid 500); 23 Aug 2013 16:18:55 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 24395 invoked by uid 500); 23 Aug 2013 16:18:54 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 24284 invoked by uid 99); 23 Aug 2013 16:18:54 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Aug 2013 16:18:54 +0000 Date: Fri, 23 Aug 2013 16:18:53 +0000 (UTC) From: "Ashutosh Chauhan (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748676#comment-13748676 ] Ashutosh Chauhan commented on HIVE-3562: ---------------------------------------- [~navis] It occurred to me that this optimization will become very powerful in combination with HIVE-4002 Imagine a case where there is a limit which can be pushed up in front of last RS. Than mappers will output very little data and with HIVE-4002 we can eliminate reducer altogether. Though these two optimizations cannot occur simultaneously in their current form since RSHash is implemented inside RS. We need to reimplement RSHash in FS. Alternative approach could be to implement RSHash as an operator which can than be put in front of either RS or FS. What do you think? > Some limit can be pushed down to map stage > ------------------------------------------ > > Key: HIVE-3562 > URL: https://issues.apache.org/jira/browse/HIVE-3562 > Project: Hive > Issue Type: Bug > Reporter: Navis > Assignee: Navis > Priority: Trivial > Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch > > > Queries with limit clause (with reasonable number), for example > {noformat} > select * from src order by key limit 10; > {noformat} > makes operator tree, > TS-SEL-RS-EXT-LIMIT-FS > But LIMIT can be partially calculated in RS, reducing size of shuffling. > TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira