Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D80110577 for ; Tue, 17 Dec 2013 10:54:17 +0000 (UTC) Received: (qmail 76548 invoked by uid 500); 17 Dec 2013 10:54:12 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 76392 invoked by uid 500); 17 Dec 2013 10:54:09 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 76315 invoked by uid 500); 17 Dec 2013 10:54:08 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 76241 invoked by uid 99); 17 Dec 2013 10:54:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Dec 2013 10:54:07 +0000 Date: Tue, 17 Dec 2013 10:54:07 +0000 (UTC) From: "Adrian Popescu (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Popescu updated HIVE-6041: --------------------------------- Description: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through "hive.optimize.skewjoin". For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in "ql/optimizer/physical/GenMRSkewJoinProcessor.java", processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. was: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through "hive.optimize.skewjoin". For the case that skewed keys do not exist, all tasks following the common join are filtered out. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in "ql/optimizer/physical/GenMRSkewJoinProcessor.java", processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. > Incorrect task dependency graph for skewed join optimization > ------------------------------------------------------------ > > Key: HIVE-6041 > URL: https://issues.apache.org/jira/browse/HIVE-6041 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.11.0 > Environment: Hadoop 1.0.3 > Reporter: Adrian Popescu > Priority: Critical > > The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through "hive.optimize.skewjoin". For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. > In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. > The bug resides in "ql/optimizer/physical/GenMRSkewJoinProcessor.java", processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)