hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7269) First query in ptf.q (Partition Table Function test) fails when input table is changed to ORC format
Date Fri, 20 Jun 2014 22:24:25 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039462#comment-14039462
] 

Matt McCline commented on HIVE-7269:
------------------------------------



My problem (from a Google search)...

LOAD DATA just copies the files to hive datafiles. Hive does not do any transformation while
loading data into tables.

> First query in ptf.q (Partition Table Function test) fails when input table is changed
to ORC format
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7269
>                 URL: https://issues.apache.org/jira/browse/HIVE-7269
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>
> This fails:
> {noformat}
> CREATE TABLE partorc( 
>     p_partkey INT,
>     p_name STRING,
>     p_mfgr STRING,
>     p_brand STRING,
>     p_type STRING,
>     p_size INT,
>     p_container STRING,
>     p_retailprice DOUBLE,
>     p_comment STRING
> ) STORED AS ORC;
> LOAD DATA LOCAL INPATH '/Users/mmccline/hive_ptf/data/files/part_tiny.txt' overwrite
into table partorc;
> select 
>   p_mfgr, 
>   p_name, 
>   p_size,
>   rank() 
>     over (partition by p_mfgr order by p_name) as r,
>   dense_rank() 
>     over (partition by p_mfgr order by p_name) as dr,
>   sum(p_retailprice) 
>     over (partition by p_mfgr order by p_name rows between unbounded preceding and current
row) as s1
> from noop(on part 
>   partition by p_mfgr
>   order by p_name
>   );
> {noformat}
> The same thing works when STORED AS ORC clause removed.
> If you specify set hive.execution.engine=tez, you get these failure stack traces for
the ORC table.
> {noformat}
> 14/06/20 15:05:33 [main]: ERROR tez.TezJobMonitor: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1403230487252_0002_1_02, diagnostics=[Task
failed, taskId=task_1403230487252_0002_1_02_000000, diagnostics=[AttemptID:attempt_1403230487252_0002_1_02_000000_0
Info:Error: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.io.IOException:
Malformed ORC file hdfs://localhost:9000/user/hive/warehouse/partorc/part_tiny.txt. Invalid
postscript.
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
> 	at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:581)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:394)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:570)
> Caused by: java.lang.RuntimeException: java.io.IOException: java.io.IOException: Malformed
ORC file hdfs://localhost:9000/user/hive/warehouse/partorc/part_tiny.txt. Invalid postscript.
> 	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:174)
> 	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:113)
> 	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:79)
> 	at org.apache.tez.mapreduce.input.MRInput.setupOldRecordReader(MRInput.java:250)
> 	at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:400)
> 	at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:379)
> 	at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:110)
> 	at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:79)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:142)
> 	... 6 more
> Caused by: java.io.IOException: java.io.IOException: Malformed ORC file hdfs://localhost:9000/user/hive/warehouse/partorc/part_tiny.txt.
Invalid postscript.
> 	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> 	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243)
> 	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:171)
> 	... 14 more
> Caused by: java.io.IOException: Malformed ORC file hdfs://localhost:9000/user/hive/warehouse/partorc/part_tiny.txt.
Invalid postscript.
> 	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.ensureOrcFooter(ReaderImpl.java:226)
> 	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:336)
> 	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:292)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:201)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1010)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:241)
> 	... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message