hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remus Rusanu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5845) CTAS failed on vectorized code path
Date Tue, 19 Nov 2013 17:35:24 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826719#comment-13826719
] 

Remus Rusanu commented on HIVE-5845:
------------------------------------

Hello Ashutosh,

I’ve looked at this and my opinion is that the problem is with the Orc’ VectorizedSerde.serialize.
Despite the fact that we’re writing an OrcStruct field, it adds to the OrcSerde object created
the passed in object inspector, which is for the input struct, instead of the OrcStructInspectr
which should be used with the created OrcStruct.

I tried this patch:

diff --git ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java
index d765353..c4268c1 100644
--- ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java
+++ ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java
@@ -143,9 +143,9 @@ public SerDeStats getSerDeStats() {
   public Writable serializeVector(VectorizedRowBatch vrg, ObjectInspector objInspector)
       throws SerDeException {
     if (vos == null) {
-      vos = new VectorizedOrcSerde(objInspector);
+      vos = new VectorizedOrcSerde(getObjectInspector());
     }
-    return vos.serialize(vrg, objInspector);
+    return vos.serialize(vrg, getObjectInspector());
   }

However, with this fix I’m hitting other (very familiar…) cast exceptions:

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable
cannot be cast to java.sql.Timestamp
        at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaTimestampObjectInspector.getPrimitiveJavaObject(JavaTimestampObjectInspector.java:39)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TimestampTreeWriter.write(WriterImpl.java:1172)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1349)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1962)
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:78)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:159)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot
be cast to org.apache.hadoop.io.IntWritable
        at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:36)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:762)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1349)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1962)
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:78)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:159)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)

Before I go and hack through code I’m only vaguely familiar with (the Orc serdes), do you
have someone more experienced in this area at HW to have a look too?
It seems that the Orc writer expects Java primitive types where the vector file sink creates
Writables instead… I’m afraid if I ‘fix’ this one way, some other place will break.

Thanks,
~Remus


From: Ashutosh Chauhan (JIRA) [mailto:jira@apache.org]
Sent: Tuesday, November 19, 2013 1:11 AM
To: Remus Rusanu
Subject: [jira] [Commented] (HIVE-5845) CTAS failed on vectorized code path

[https://issues.apache.org/jira/secure/useravatar?avatarId=10452]

Ashutosh Chauhan<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ashutoshc>
commented on an issue




Re: CTAS failed on vectorized code path<https://issues.apache.org/jira/browse/HIVE-5845>



Stack-trace:



Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot
be cast to [Ljava.lang.Object;

        at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldData(StandardStructObjectInspector.java:173)

        at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1349)

        at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1962)

        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:78)

        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:159)

        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)

        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)

        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)

        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)

        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)

        at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)

        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)

        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)

        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)


[Add Comment]<https://issues.apache.org/jira/browse/HIVE-5845#add-comment>

Add Comment<https://issues.apache.org/jira/browse/HIVE-5845#add-comment>






Hive<https://issues.apache.org/jira/browse/HIVE> / [Bug] <https://issues.apache.org/jira/browse/HIVE-5845>
HIVE-5845<https://issues.apache.org/jira/browse/HIVE-5845>

CTAS failed on vectorized code path<https://issues.apache.org/jira/browse/HIVE-5845>


Following query fails:
 create table store_sales_2 stored as orc as select * from alltypesorc;



This message was sent by Atlassian JIRA (v6.1#6144-sha1:2e50328)

[Atlassian logo]






> CTAS failed on vectorized code path
> -----------------------------------
>
>                 Key: HIVE-5845
>                 URL: https://issues.apache.org/jira/browse/HIVE-5845
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ashutosh Chauhan
>            Assignee: Remus Rusanu
>
> Following query fails:
>  create table store_sales_2 stored as orc as select * from alltypesorc;



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message