Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8A139200C4C for ; Tue, 4 Apr 2017 22:05:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 88BBE160BA2; Tue, 4 Apr 2017 20:05:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D68DB160B77 for ; Tue, 4 Apr 2017 22:05:45 +0200 (CEST) Received: (qmail 30739 invoked by uid 500); 4 Apr 2017 20:05:44 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 30729 invoked by uid 99); 4 Apr 2017 20:05:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Apr 2017 20:05:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 43DE4CA9FF for ; Tue, 4 Apr 2017 20:05:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id DvP-TP98Zsu8 for ; Tue, 4 Apr 2017 20:05:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id BA7F05FCB9 for ; Tue, 4 Apr 2017 20:05:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id DFAF8E0185 for ; Tue, 4 Apr 2017 20:05:41 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9BF3B21D63 for ; Tue, 4 Apr 2017 20:05:41 +0000 (UTC) Date: Tue, 4 Apr 2017 20:05:41 +0000 (UTC) From: "zhihai xu (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-16368) Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView Operation for hive on MR. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 04 Apr 2017 20:05:46 -0000 [ https://issues.apache.org/jira/browse/HIVE-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955718#comment-15955718 ] zhihai xu commented on HIVE-16368: ---------------------------------- Yes, thanks for the review, I will add a .q test case. > Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView Operation for hive on MR. > ------------------------------------------------------------------------------------------------------- > > Key: HIVE-16368 > URL: https://issues.apache.org/jira/browse/HIVE-16368 > Project: Hive > Issue Type: Bug > Components: Query Planning > Reporter: zhihai xu > Assignee: zhihai xu > Attachments: HIVE-16368.000.patch > > > Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened in LaterView Operation. It happened for hive-on-mr. The reason is because the column prune change the column order in LaterView operation, for back-back reducesink operators using MR engine, FileSinkOperator and TableScanOperator are added before the second ReduceSink operator, The serialization column order used by FileSinkOperator in LazyBinarySerDe of previous reducer is different from deserialization column order from table desc used by MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper. > The serialization is decided by the outputObjInspector from LateralViewJoinOperator, > {code} > ArrayList fieldNames = conf.getOutputInternalColNames(); > outputObjInspector = ObjectInspectorFactory > .getStandardStructObjectInspector(fieldNames, ois); > {code} > So the column order for serialization is decided by getOutputInternalColNames in LateralViewJoinOperator. > The deserialization is decided by TableScanOperator which is created at GenMapRedUtils.splitTasks. > {code} > TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils > .getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol")); > // Create the temporary file, its corresponding FileSinkOperaotr, and > // its corresponding TableScanOperator. > TableScanOperator tableScanOp = > createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx); > {code} > The column order for deserialization is decided by rowSchema of LateralViewJoinOperator. > But ColumnPrunerLateralViewJoinProc changed the order of outputInternalColNames but still keep the original order of rowSchema, > Which cause the mismatch between serialization and deserialization for two back-to-back MR jobs. > Similar issue for ColumnPrunerLateralViewForwardProc which change the column order of its child selector colList but not rowSchema. > The exception is > {code} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 875968094 > at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:78) > at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43) > at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264) > at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201) > at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) > at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94) > at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) > at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) > at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:554) > at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:381) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)