Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 458D811657 for ; Wed, 17 Sep 2014 19:25:35 +0000 (UTC) Received: (qmail 45447 invoked by uid 500); 17 Sep 2014 19:25:34 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 45367 invoked by uid 500); 17 Sep 2014 19:25:34 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 45350 invoked by uid 500); 17 Sep 2014 19:25:34 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 45347 invoked by uid 99); 17 Sep 2014 19:25:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Sep 2014 19:25:34 +0000 Date: Wed, 17 Sep 2014 19:25:34 +0000 (UTC) From: "Na Yang (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-8162) hive.optimize.sort.dynamic.partition causes RuntimeException for inserting into dynamic partitioned table when map function is used in the subquery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137784#comment-14137784 ] Na Yang commented on HIVE-8162: ------------------------------- The operator tree for this query is like: TS0-FIL9-SEL2-GBY4-RS5-GBY6-SEL7-RS10-EX11-FS8. The task graph for this query is like: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 depends on stages: Stage-2 Stage-3 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: associateddata Statistics: Num rows: 25374 Data size: 101496 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (sm_campaign_id) IN (10187171, 1090942, 10541943, 10833443, 8635630, 10187170, 9445296, 10696334, 11398585, 9524211, 1145211) (type: boolean) Statistics: Num rows: 12687 Data size: 50748 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: map('x_product_id':'') (type: map), day_id (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 12687 Data size: 50748 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: map), _col1 (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 12687 Data size: 50748 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: map), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: map), _col1 (type: int) Statistics: Num rows: 12687 Data size: 50748 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: map), KEY._col1 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: 2 (type: int), _col0 (type: map), _col1 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-2 Map Reduce Map Operator Tree: TableScan Reduce Output Operator key expressions: _col2 (type: int), _col0 (type: map), _col1 (type: int) sort order: +++ Map-reduce partition columns: _col2 (type: int) Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: map), _col2 (type: int) Reduce Operator Tree: Extract Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.agg_pv_associateddata_c Stage: Stage-0 Move Operator tables: partition: day_id replace: false table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.agg_pv_associateddata_c Stage: Stage-3 Stats-Aggr Operator The exception happens when executing task stage-2. The ReduceSinkDesc for RS10 has keycols type as {int, map, int} and the intermediate file for this table is stored in SequenceFileInputFormat and using LazyBinarySerDe. However, the LazyBinarySerDe is not able to deserialize non-primitive type from the intermediate file which causes the exception. Using the TextInputFormat and LazySimpleSerDe for the intermediate file, the exception is gone. However, changing the intermediate file InputFormat and SerDe is not a preferred solution. > hive.optimize.sort.dynamic.partition causes RuntimeException for inserting into dynamic partitioned table when map function is used in the subquery > ---------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-8162 > URL: https://issues.apache.org/jira/browse/HIVE-8162 > Project: Hive > Issue Type: Bug > Affects Versions: 0.13.0 > Reporter: Na Yang > Attachments: 47rows.txt > > > Exception: > Diagnostic Messages for this Task: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x129x51x83x14x1x128x0x0x2x1x1x1x120x95x112x114x111x100x117x99x116x95x105x100x0x1x0x0x255 with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+++, columns.types=int,map,int} > at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:518) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:462) > at org.apache.hadoop.mapred.Child$4.run(Child.java:282) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122) > at org.apache.hadoop.mapred.Child.main(Child.java:271) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x129x51x83x14x1x128x0x0x2x1x1x1x120x95x112x114x111x100x117x99x116x95x105x100x0x1x0x0x255 with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+++, columns.types=int,map,int} > at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:222) > ... 7 more > Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException > at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:189) > at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:220) > ... 7 more > Caused by: java.io.EOFException > at org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54) > at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:533) > at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:236) > at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:185) > ... 8 more > Step to reproduce the exception: > ------------------------------------------------- > CREATE TABLE associateddata(creative_id int,creative_group_id int,placement_id > int,sm_campaign_id int,browser_id string, trans_type_p string,trans_time_p > string,group_name string,event_name string,order_id string,revenue > float,currency string, trans_type_ci string,trans_time_ci string,f16 > map,campaign_id int,user_agent_cat string,geo_country > string,geo_city string,geo_state string,geo_zip string,geo_dma string,geo_area > string,geo_isp string,site_id int,section_id int,f16_ci map) > PARTITIONED BY(day_id int, hour_id int) ROW FORMAT DELIMITED FIELDS TERMINATED > BY '\t'; > LOAD DATA LOCAL INPATH '/tmp/47rows.txt' INTO TABLE associateddata > PARTITION(day_id=20140814,hour_id=2014081417); > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nonstrict; > CREATE EXTERNAL TABLE IF NOT EXISTS agg_pv_associateddata_c ( > vt_tran_qty int COMMENT 'The count of view > thru transactions' > , pair_value_txt string COMMENT 'F16 name values > pairs' > ) > PARTITIONED BY (day_id int) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > STORED AS TEXTFILE > LOCATION '/user/prodman/agg_pv_associateddata_c'; > INSERT INTO TABLE agg_pv_associateddata_c PARTITION (day_id) > select 2 as vt_tran_qty, pair_value_txt, day_id > from (select map( 'x_product_id',coalesce(F16['x_product_id'],'') ) as pair_value_txt , day_id , hour_id > from associateddata where hour_id = 2014081417 and sm_campaign_id in > (10187171,1090942,10541943,10833443,8635630,10187170,9445296,10696334,11398585,9524211,1145211) > ) a GROUP BY pair_value_txt, day_id; > The query worked fine in Hive-0.12 and Hive-0.13. It starts failing in Hive-0.13. If hive.optimize.sort.dynamic.partition is turned off in Hive-0.13, the exception is gone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)