Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB16D19EAB for ; Fri, 18 Mar 2016 04:07:33 +0000 (UTC) Received: (qmail 63678 invoked by uid 500); 18 Mar 2016 04:07:33 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 63654 invoked by uid 500); 18 Mar 2016 04:07:33 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 63637 invoked by uid 99); 18 Mar 2016 04:07:33 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2016 04:07:33 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6C5102C14F8 for ; Fri, 18 Mar 2016 04:07:33 +0000 (UTC) Date: Fri, 18 Mar 2016 04:07:33 +0000 (UTC) From: "Chao Sun (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-13300) Hive on spark throws exception for multi-insert MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200971#comment-15200971 ] Chao Sun commented on HIVE-13300: --------------------------------- +1 on pending test. > Hive on spark throws exception for multi-insert > ----------------------------------------------- > > Key: HIVE-13300 > URL: https://issues.apache.org/jira/browse/HIVE-13300 > Project: Hive > Issue Type: Bug > Components: Spark > Affects Versions: 2.0.0 > Reporter: Szehon Ho > Assignee: Szehon Ho > Attachments: HIVE-13300.patch > > > For certain multi-insert queries, Hive on Spark throws a deserialization error. > {noformat} > create table status_updates(userid int,status string,ds string); > create table profiles(userid int,school string,gender int); > drop table school_summary; create table school_summary(school string,cnt int) partitioned by (ds string); > drop table gender_summary; create table gender_summary(gender int,cnt int) partitioned by (ds string); > insert into status_updates values (1, "status_1", "2016-03-16"); > insert into profiles values (1, "school_1", 0); > set hive.auto.convert.join=false; > set hive.execution.engine=spark; > FROM (SELECT a.status, b.school, b.gender > FROM status_updates a JOIN profiles b > ON (a.userid = b.userid and > a.ds='2009-03-20' ) > ) subq1 > INSERT OVERWRITE TABLE gender_summary > PARTITION(ds='2009-03-20') > SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender > INSERT OVERWRITE TABLE school_summary > PARTITION(ds='2009-03-20') > SELECT subq1.school, COUNT(1) GROUP BY subq1.school > {noformat} > Error: > {noformat} > 16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x128x0x0 with properties {serialization.sort.order.null=a, columns=reducesinkkey0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+, columns.types=int} > at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279) > at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x128x0x0 with properties {serialization.sort.order.null=a, columns=reducesinkkey0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+, columns.types=int} > at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:251) > ... 12 more > Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException > at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:241) > at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:249) > ... 12 more > Caused by: java.io.EOFException > at org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54) > at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:597) > at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:288) > at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:237) > ... 13 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)