orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ratandeep Ratti (Jira)" <j...@apache.org>
Subject [jira] [Created] (ORC-556) ConvertTreeReader can be applied on columns of the same primitive type
Date Thu, 03 Oct 2019 17:08:00 GMT
Ratandeep Ratti created ORC-556:
-----------------------------------

             Summary: ConvertTreeReader can be applied on columns of the same primitive type
                 Key: ORC-556
                 URL: https://issues.apache.org/jira/browse/ORC-556
             Project: ORC
          Issue Type: Bug
    Affects Versions: 1.6.0, 1.6.1
            Reporter: Ratandeep Ratti


I'm seeing the following exception when reading old ORC data with Iceberg
{noformat}
0.0 in stage 0.0 (TID 0, executor 1): java.lang.IllegalArgumentException: No conversion of
type INT to self needed
	at org.apache.iceberg.shaded.org.apache.orc.impl.ConvertTreeReaderFactory.createAnyIntegerConvertTreeReader(ConvertTreeReaderFactory.java:1659)
	at org.apache.iceberg.shaded.org.apache.orc.impl.ConvertTreeReaderFactory.createConvertTreeReader(ConvertTreeReaderFactory.java:2112)
	at org.apache.iceberg.shaded.org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2327)
	at org.apache.iceberg.shaded.org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1957)
	at org.apache.iceberg.shaded.org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2367)
	at org.apache.iceberg.shaded.org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1957)
	at org.apache.iceberg.shaded.org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2367)
	at org.apache.iceberg.shaded.org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:230)
	at org.apache.iceberg.shaded.org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:741)
	at org.apache.iceberg.orc.OrcIterable.newOrcIterator(OrcIterable.java:87)
	at org.apache.iceberg.orc.OrcIterable.iterator(OrcIterable.java:72)
	at org.apache.iceberg.spark.source.Reader$TaskDataReader.open(Reader.java:470)
	at org.apache.iceberg.spark.source.Reader$TaskDataReader.open(Reader.java:422)
	at org.apache.iceberg.spark.source.Reader$TaskDataReader.<init>(Reader.java:356)
	at org.apache.iceberg.spark.source.Reader$ReadTask.createPartitionReader(Reader.java:305)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD.compute(DataSourceRDD.scala:42)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
{noformat}

I think the problem lies in the following snippet
{code}
if (!fileType.equals(readerType) &&
    ... // elided)) {
      ...
}
{code}
We are doing an equals comparison. This equals comparison can now fail for atleast 2 reasons
1. Reader schema has annotations [properties] and old file schema does not
2. Reader schema field name does not match in case with the file schema. This, I suspect,
is because the old data was written by Hive.

At least 1 can be fixed if we change 
{code}
fileType.equals(readerType) => fileType.getCategory().equals(readerType.getCategory())

{code}

I'm currently unsure of the repercussions of this so haven't made this change myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message