hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int
Date Mon, 13 May 2013 21:17:16 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656388#comment-13656388
] 

Sushanth Sowmyan commented on HIVE-4551:
----------------------------------------

I'm attaching a patch for this, by doing the following:

a) Removing promotion logic from HCatSchema, keeping that "pure" so it reflects the table
type.
b) Doing to conversion to appropriate pig types inside PigHCatUTil. This breaks Travis' original
intent of having HCatRecord/HCatSchema do promotions for all M/R programs, but given that
there was a bug in that conversion anyway, this breakage is not a backward-incompatible breakage.
c) If we intend to add back that support, then the correct way to do that, imo, is to add
that promotion to HCatRecord's accessors, but leave HCatSchema alone.
d) I've also added a new Testcase to mimic the e2e test that failed, and so we can build on
that from now on. I've also refactored more Loader/Storer tests to run against orc as well.


                
> ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-4551
>                 URL: https://issues.apache.org/jira/browse/HIVE-4551
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>
> This was initially reported from an e2e test run, with the following E2E test:
> {code}
>                 {
>                         'name' => 'Hadoop_ORC_Write',
>                         'tests' => [
>                                 {
>                                  'num' => 1
>                                 ,'hcat_prep'=>q\
> drop table if exists hadoop_orc;
> create table hadoop_orc (
>             t tinyint,
>             si smallint,
>             i int,
>             b bigint,
>             f float,
>             d double,
>             s string)
>         stored as orc;\
>                                 ,'hadoop' => q\
> jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars :HCAT_JAR: :THRIFTSERVER:
all100k hadoop_orc\,
>                                 ,'result_table' => 'hadoop_orc'
>                                 ,'sql' => q\select * from all100k;\
>                                 ,'floatpostprocess' => 1
>                                 ,'delimiter' => '       '
>                                 },
>                        ],
>                 },
> {code}
> This fails with the following error:
> {code}
> 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running child
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read
value to tuple
> 	at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> 	at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
> 	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> 	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable
cannot be cast to org.apache.hadoop.io.IntWritable
> 	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
> 	at org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290)
> 	at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
> 	at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> 	at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> 	at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
> 	at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
> 	... 12 more
> 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the
task
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message