hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9235) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR
Date Wed, 14 Jan 2015 07:51:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276620#comment-14276620
] 

Matt McCline commented on HIVE-9235:
------------------------------------

First issue (vectorization of Parquet):
Missing cases in VectorColumnAssignFactory.java's  public static VectorColumnAssign[] buildAssigners(VectorizedRowBatch
outputBatch,
      Writable[] writables) for HiveCharWritable, HiveVarcharWritable, DateWritable, HiveDecimalWriter.

Example of exception caused:
{noformat}
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented
vector assigner for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
	at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:136)
	at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:49)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
	... 21 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner
for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
	at org.apache.hadoop.hive.ql.exec.vector.VectorColumnAssignFactory.buildAssigners(VectorColumnAssignFactory.java:528)
	at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:127)
	... 23 more
{noformat}

Added code to fix that.

Then, I copied a half dozen q vectorized tests using ORC tables and tried converted them to
use PARQUET, but encountered another issue in *non-vectorized* mode.  I was trying to establish
base query outputs that I could use to verify the vectorized query output.  This indicated
a basic non-vectorized case of CHAR data type usage wasn't working for PARQUET.
{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException:
org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
	... 10 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
	at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
	at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
	at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
	... 16 more
{noformat}

I filed this problem under HIVE-9371: Execution error for Parquet table and GROUP BY involving
CHAR data type

At that point we concluded we should temporarily disable vectorization of PARQUET since there
is only one test that doesn't provide complete coverage of data types.

FYI: [~hagleitn]

> Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR,
and VARCHAR
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9235
>                 URL: https://issues.apache.org/jira/browse/HIVE-9235
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-9235.01.patch
>
>
> Title was: Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP,
CHAR, and VARCHAR
> Support for doing vector column assign is missing for some data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message