Return-Path: Delivered-To: apmail-hadoop-hive-dev-archive@minotaur.apache.org Received: (qmail 47361 invoked from network); 16 Apr 2009 01:20:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Apr 2009 01:20:38 -0000 Received: (qmail 934 invoked by uid 500); 16 Apr 2009 01:20:38 -0000 Delivered-To: apmail-hadoop-hive-dev-archive@hadoop.apache.org Received: (qmail 869 invoked by uid 500); 16 Apr 2009 01:20:37 -0000 Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hadoop.apache.org Delivered-To: mailing list hive-dev@hadoop.apache.org Received: (qmail 858 invoked by uid 99); 16 Apr 2009 01:20:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Apr 2009 01:20:37 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Apr 2009 01:20:36 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E1702234C003 for ; Wed, 15 Apr 2009 18:20:14 -0700 (PDT) Message-ID: <407136909.1239844814909.JavaMail.jira@brutus> Date: Wed, 15 Apr 2009 18:20:14 -0700 (PDT) From: "Namit Jain (JIRA)" To: hive-dev@hadoop.apache.org Subject: [jira] Updated: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100" In-Reply-To: <135711392.1239349992938.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-404: ---------------------------- Attachment: hive.404.1.patch > Problems in "SELECT * FROM t SORT BY col1 LIMIT 100" > ---------------------------------------------------- > > Key: HIVE-404 > URL: https://issues.apache.org/jira/browse/HIVE-404 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.3.0, 0.4.0 > Reporter: Zheng Shao > Attachments: hive.404.1.patch > > > Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected results with the query of "SELECT * FROM t SORT BY col1 LIMIT 100" > Basically, in the first map-reduce job, each reducer will get sorted data and only keep the first 100. In the second map-reduce job, we will distribute and sort the data randomly, before feeding into a single reducer that outputs the first 100. > In short, the query will output 100 random records in N * 100 top records from each of the reducer in the first map-reduce job. > This is contradicting to what people expects. > We should propagate the SORT BY columns to the second map-reduce job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.