hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <>
Subject [jira] [Commented] (HIVE-16368) Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView Operation for hive on MR.
Date Thu, 06 Apr 2017 03:35:41 GMT


zhihai xu commented on HIVE-16368:

I added a test case to lateral_view_onview.q in the second patch HIVE-16368.001.patch.

> Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView Operation
for hive on MR.
> -------------------------------------------------------------------------------------------------------
>                 Key: HIVE-16368
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: HIVE-16368.000.patch, HIVE-16368.001.patch
> Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened in LaterView
Operation. It happened for hive-on-mr. The reason is because the column prune change the column
order in LaterView operation, for back-back reducesink operators using MR engine, FileSinkOperator
and TableScanOperator are added before the second ReduceSink operator, The serialization column
order used by FileSinkOperator in LazyBinarySerDe of previous reducer is different from deserialization
column order from table desc used by MapOperator/TableScanOperator in LazyBinarySerDe of current
failed mapper.
> The serialization is decided by the outputObjInspector from LateralViewJoinOperator,
> {code}
>     ArrayList<String> fieldNames = conf.getOutputInternalColNames();
>     outputObjInspector = ObjectInspectorFactory
>         .getStandardStructObjectInspector(fieldNames, ois);
> {code}
> So the column order for serialization is decided by getOutputInternalColNames in LateralViewJoinOperator.
> The deserialization is decided by TableScanOperator which is created at  GenMapRedUtils.splitTasks.

> {code}
>     TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils
>         .getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol"));
>     // Create the temporary file, its corresponding FileSinkOperaotr, and
>     // its corresponding TableScanOperator.
>     TableScanOperator tableScanOp =
>         createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx);
> {code}
> The column order for deserialization is decided by rowSchema of LateralViewJoinOperator.
> But ColumnPrunerLateralViewJoinProc changed the order of outputInternalColNames but still
keep the original order of rowSchema,
> Which cause the mismatch between serialization and deserialization for two back-to-back
MR jobs.
> Similar issue for ColumnPrunerLateralViewForwardProc which change the column order of
its child selector colList but not rowSchema.
> The exception is 
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 875968094
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(
> 	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(
> 	at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(
> 	at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(
> {code}

This message was sent by Atlassian JIRA

View raw message