Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 68CB21085A for ; Wed, 18 Feb 2015 20:21:12 +0000 (UTC) Received: (qmail 84474 invoked by uid 500); 18 Feb 2015 20:21:12 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 84385 invoked by uid 500); 18 Feb 2015 20:21:12 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 84373 invoked by uid 500); 18 Feb 2015 20:21:11 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 84370 invoked by uid 99); 18 Feb 2015 20:21:11 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2015 20:21:11 +0000 Date: Wed, 18 Feb 2015 20:21:11 +0000 (UTC) From: "Pavan Srinivas (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the "distibute by" clause barfs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavan Srinivas updated HIVE-9718: --------------------------------- Description: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; insert overwrite table nation_new_p partition (some) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a optimization of deduplication of columns is done. But when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. carefully final deposits detect slyly agai"} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. carefully final deposits detect slyly agai"} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Tables used are: {code} CREATE EXTERNAL TABLE `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; {code} and {code} CREATE TABLE `nation_new_p`( `n_name1` string, `n_name2` string) PARTITIONED BY ( `some` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' {code} Sample data for the table is provided by the file attached with. was: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; insert overwrite table nation_new_p partition (some) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a optimization of deduplication of columns is done. But when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. carefully final deposits detect slyly agai"} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. carefully final deposits detect slyly agai"} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Tables used are: {code} CREATE EXTERNAL TABLE `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; {code} and {code} CREATE TABLE `nation_new_p`( `n_name1` string, `n_name2` string, `n_name3` string) PARTITIONED BY ( `some` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' {code} Sample data for the table is provided by the file attached with. > Insert into dynamic partitions with same column structure in the "distibute by" clause barfs > -------------------------------------------------------------------------------------------- > > Key: HIVE-9718 > URL: https://issues.apache.org/jira/browse/HIVE-9718 > Project: Hive > Issue Type: Bug > Affects Versions: 0.14.0, 1.0.0 > Reporter: Pavan Srinivas > Priority: Critical > Attachments: nation.tbl, patch.txt > > > Sample reproducible query: > {code} > SET hive.exec.dynamic.partition.mode=nonstrict; > SET hive.exec.dynamic.partition=true; > insert overwrite table nation_new_p partition (some) > select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; > {code} > Note: Make sure there is data in the source table to reproduce the issue. > During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a optimization of deduplication of columns is done. But when one of the columns is used as part of partitioned/distribute by, its not taken care of. > The above query produces exception as follows: > {code} > Diagnostic Messages for this Task: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. carefully final deposits detect slyly agai"} > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) > at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) > at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) > at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. carefully final deposits detect slyly agai"} > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) > ... 12 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] > at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) > ... 13 more > Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] > at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) > at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) > at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) > at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) > at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) > ... 19 more > {code} > Tables used are: > {code} > CREATE EXTERNAL TABLE `nation`( > `n_nationkey` int, > `n_name` string, > `n_regionkey` int, > `n_comment` string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; > {code} > and > {code} > CREATE TABLE `nation_new_p`( > `n_name1` string, > `n_name2` string) > PARTITIONED BY ( > `some` string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > {code} > Sample data for the table is provided by the file attached with. -- This message was sent by Atlassian JIRA (v6.3.4#6332)