hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int
Date Mon, 13 May 2013 21:13:16 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656384#comment-13656384
] 

Sushanth Sowmyan commented on HIVE-4551:
----------------------------------------

The problem here is that the raw data encapsulated by HCatRecord and HCatSchema are out of
synch, which was one of my worries back in HCATALOG-425 : https://issues.apache.org/jira/browse/HCATALOG-425?focusedCommentId=13439652&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13439652

Basically, the raw data contained in the smallint/tinyint columns are raw shorts and bytes,
and we try to read it as an Int. In the case of rcfile, the underlying raw data is also stored
as an IntWritable in the cases of smallint and tinyint, but not so in the case of orc. This
leads to the following kind of calls in the rcfile case, and in the orc case:

RCFILE:
{noformat}
13/05/11 02:56:10 INFO mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
with properties {transient_lastDdlTime=1368266162, serialization.null.format=\N, columns=ti,si,i,bi,f,d,b,
serialization.format=1, columns.types=int,int,int,bigint,float,double,boolean}
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:-3
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:9001
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:86400
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyLong:4294967297
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyLongObjectInspector:bigint
==> org.apache.hadoop.hive.serde2.lazy.LazyFloat:34.532
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyFloatObjectInspector:float
==> org.apache.hadoop.hive.serde2.lazy.LazyDouble:2.184239842983489E15
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyDoubleObjectInspector:double
==> org.apache.hadoop.hive.serde2.lazy.LazyBoolean:true
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyBooleanObjectInspector:boolean
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:0
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:0
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:0
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyLong:0
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyLongObjectInspector:bigint
==> org.apache.hadoop.hive.serde2.lazy.LazyFloat:0.0
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyFloatObjectInspector:float
==> org.apache.hadoop.hive.serde2.lazy.LazyDouble:0.0
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyDoubleObjectInspector:double
==> org.apache.hadoop.hive.serde2.lazy.LazyBoolean:false
==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyBooleanObjectInspector:boolean
{noformat}

ORC:
{noformat}
13/05/11 02:56:16 INFO mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.ql.io.orc.OrcSerde
with properties {transient_lastDdlTime=1368266162, serialization.null.format=\N, columns=ti,si,i,bi,f,d,b,
serialization.format=1, columns.types=int,int,int,bigint,float,double,boolean}
==> org.apache.hadoop.hive.serde2.io.ByteWritable:-3
==> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector:int
13/05/11 02:56:16 WARN mapred.LocalJobRunner: job_local_0003
org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value
to tuple
        at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
        at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot
be cast to org.apache.hadoop.io.IntWritable
        at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
        at org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:292)
        at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
        at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
        at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
        at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
        at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
        ... 8 more
{noformat}

(There is also an additional bug in how they are read for promotion, assuming Byte where it's
ByteWritable, etc)


                
> ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-4551
>                 URL: https://issues.apache.org/jira/browse/HIVE-4551
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>
> This was initially reported from an e2e test run, with the following E2E test:
> {code}
>                 {
>                         'name' => 'Hadoop_ORC_Write',
>                         'tests' => [
>                                 {
>                                  'num' => 1
>                                 ,'hcat_prep'=>q\
> drop table if exists hadoop_orc;
> create table hadoop_orc (
>             t tinyint,
>             si smallint,
>             i int,
>             b bigint,
>             f float,
>             d double,
>             s string)
>         stored as orc;\
>                                 ,'hadoop' => q\
> jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars :HCAT_JAR: :THRIFTSERVER:
all100k hadoop_orc\,
>                                 ,'result_table' => 'hadoop_orc'
>                                 ,'sql' => q\select * from all100k;\
>                                 ,'floatpostprocess' => 1
>                                 ,'delimiter' => '       '
>                                 },
>                        ],
>                 },
> {code}
> This fails with the following error:
> {code}
> 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running child
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read
value to tuple
> 	at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> 	at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
> 	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> 	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable
cannot be cast to org.apache.hadoop.io.IntWritable
> 	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
> 	at org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290)
> 	at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
> 	at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> 	at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> 	at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
> 	at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
> 	... 12 more
> 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the
task
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message