Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7118AD9C8 for ; Tue, 30 Oct 2012 17:50:18 +0000 (UTC) Received: (qmail 71829 invoked by uid 500); 30 Oct 2012 17:50:18 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 71598 invoked by uid 500); 30 Oct 2012 17:50:17 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 71490 invoked by uid 500); 30 Oct 2012 17:50:16 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 71482 invoked by uid 99); 30 Oct 2012 17:50:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Oct 2012 17:50:16 +0000 Date: Tue, 30 Oct 2012 17:50:16 +0000 (UTC) From: "Vighnesh Avadhani (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <1328121841.45747.1351619416934.JavaMail.jiratomcat@arcas> In-Reply-To: <251634723.43606.1351584252397.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HIVE-3640) Reducer allocation is incorrect if enforce bucketing and mapred.reduce.tasks are both set MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487061#comment-13487061 ] Vighnesh Avadhani commented on HIVE-3640: ----------------------------------------- Done: https://reviews.facebook.net/D6327 I could not use arc diff --jira HIVE-3640 as it was throwing: PHP Fatal error: Call to undefined method ArcanistGitAPI::amendGitHeadCommit() in /Users/vighnesh/hive/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on line 173 > Reducer allocation is incorrect if enforce bucketing and mapred.reduce.tasks are both set > ----------------------------------------------------------------------------------------- > > Key: HIVE-3640 > URL: https://issues.apache.org/jira/browse/HIVE-3640 > Project: Hive > Issue Type: Bug > Reporter: Vighnesh Avadhani > Assignee: Vighnesh Avadhani > Priority: Minor > Attachments: HIVE-3640.1.patch.txt > > Original Estimate: 48h > Remaining Estimate: 48h > > When I enforce bucketing and fix the number of reducers via mapred.reduce.tasks Hive ignores my input and instead takes the largest value <= hive.exec.reducers.max that is also an even divisor of num_buckets. In other words, if I set 1024 buckets and set mapred.reduce.tasks=1024 I'll get. . . 256 reducers. If I set 1997 buckets and set mapred.reduce.tasks=1997 I'll get. . . 1 reducer. > This is totally crazy, and it's far, far crazier when the data inputs get large. In the latter case the bucketing job will almost certainly fail because we'll most likely try to stuff several TB of input through a single reducer. We'll also drastically reduce the effectiveness of bucketing, since the buckets themselves will be larger. > If the user sets mapred.reduce.tasks in a query that inserts into a bucketed table we should either accept that value or raise an exception if it's invalid relative to the number of buckets. We should absolutely NOT override the user's direction and fall back on automatically allocating reducers based on some obscure logic dictated by completely different setting. > I have yet to encounter a single person who expected this the first time, so it's clearly a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira