Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 477961763D for ; Tue, 28 Oct 2014 13:59:34 +0000 (UTC) Received: (qmail 44999 invoked by uid 500); 28 Oct 2014 13:59:33 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 44927 invoked by uid 500); 28 Oct 2014 13:59:33 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 44916 invoked by uid 500); 28 Oct 2014 13:59:33 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 44913 invoked by uid 99); 28 Oct 2014 13:59:33 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Oct 2014 13:59:33 +0000 Date: Tue, 28 Oct 2014 13:59:33 +0000 (UTC) From: "Rui Li (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-8610) Compile time skew join optimization doesn't work with auto map join MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-8610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186853#comment-14186853 ] Rui Li commented on HIVE-8610: ------------------------------ Hi [~xuefuz], do I have to add something to {{testconfiguration.properties}} to make hive automatically run the new tests for MR? > Compile time skew join optimization doesn't work with auto map join > ------------------------------------------------------------------- > > Key: HIVE-8610 > URL: https://issues.apache.org/jira/browse/HIVE-8610 > Project: Hive > Issue Type: Bug > Reporter: Rui Li > Assignee: Rui Li > Attachments: HIVE-8610.1.patch > > > NPE is thrown if both {{hive.optimize.skewjoin.compiletime}} and {{hive.auto.convert.join}} are enabled: > {code} > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:329) > at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:236) > at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.convertTaskToMapJoinTask(CommonJoinTaskDispatcher.java:181) > at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:463) > at org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182) > at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) > at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) > at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) > at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79) > at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107) > at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:275) > at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:223) > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10028) > at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) > at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) > at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:415) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1068) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1130) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1005) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:995) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} > A simple way to reproduce this issue is to add {{set hive.auto.convert.join=true}} to one of the skew join qfile, e.g. {{skewjoinopt2.q}}. > While reduce side join can produce correct results, we kind of lost the point of skew join optimization - join skewed data via a map join to avoid one reducer getting too much records. -- This message was sent by Atlassian JIRA (v6.3.4#6332)