Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 43A8717629 for ; Mon, 22 Jun 2015 07:32:01 +0000 (UTC) Received: (qmail 79734 invoked by uid 500); 22 Jun 2015 07:32:00 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 79669 invoked by uid 500); 22 Jun 2015 07:32:00 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 79313 invoked by uid 99); 22 Jun 2015 07:32:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2015 07:32:00 +0000 Date: Mon, 22 Jun 2015 07:32:00 +0000 (UTC) From: "Rajat Jain (JIRA)" To: dev@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-11069) ColumnStatsTask doesn't work with hive.exec.parallel MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Rajat Jain created HIVE-11069: --------------------------------- Summary: ColumnStatsTask doesn't work with hive.exec.parallel Key: HIVE-11069 URL: https://issues.apache.org/jira/browse/HIVE-11069 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Rajat Jain Try a simple query: {code} hive> set hive.exec.parallel=true; hive> analyze table src compute statistics for columns; {code} It fails with errors similar to: {code} FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask hive> java.lang.RuntimeException: Error caching map.xml: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747) at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682) at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) Caused by: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.ipc.Client.call(Client.java:1450) at org.apache.hadoop.ipc.Client.call(Client.java:1402) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:539) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2758) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2729) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1954) at org.apache.hadoop.hive.ql.exec.Utilities.setPlanPath(Utilities.java:765) at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:691) ... 7 more Caused by: java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1049) at org.apache.hadoop.ipc.Client.call(Client.java:1444) ... 28 more Job Submission failed with exception 'java.lang.RuntimeException(Error caching map.xml: java.io.IOException: java.lang.InterruptedException)' {code} The problem is the Column Stats Task doesn't depend on the root task which causes errors. Here's the explain output: {code} hive> explain analyze table src compute statistics for columns; OK STAGE DEPENDENCIES: Stage-0 is a root stage Stage-1 is a root stage STAGE PLANS: Stage: Stage-0 Map Reduce Map Operator Tree: TableScan alias: src Select Operator expressions: key (type: string), value (type: string) outputColumnNames: key, value Group By Operator aggregations: compute_stats(key, 16), compute_stats(value, 16) mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator sort order: value expressions: _col0 (type: struct), _col1 (type: struct) Reduce Operator Tree: Group By Operator aggregations: compute_stats(VALUE._col0), compute_stats(VALUE._col1) mode: mergepartial outputColumnNames: _col0, _col1 File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-1 Column Stats Work Column Stats Desc: Columns: key, value Column Types: string, string Table: default.src Time taken: 0.761 seconds, Fetched: 39 row(s) {code} For reference, here's the corresponding output in Hive 0.13: {code} hive> explain analyze table orders compute statistics for columns; OK STAGE DEPENDENCIES: Stage-0 is a root stage Stage-1 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-0 Map Reduce Map Operator Tree: TableScan alias: orders Statistics: Num rows: 0 Data size: 13310103552 Basic stats: PARTIAL Column stats: COMPLETE Stage: Stage-1 Stats-Aggr Operator Time taken: 2.142 seconds, Fetched: 15 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)