hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Nastetsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7186) Unable to perform join on table
Date Fri, 20 Jun 2014 14:17:24 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038830#comment-14038830
] 

Alex Nastetsky commented on HIVE-7186:
--------------------------------------

I just saw a similar problem with with a different stacktrace. This time, the join got to
the very end of the job and failed as it finished:
{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.EOFException: Premature EOF:
no length prefix available
	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:514)
	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:332)
	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
	at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
	at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
	at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159)
	at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:548)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:599)
Caused by: java.io.EOFException: Premature EOF: no length prefix available
	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:962)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:930)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)
{code}

> Unable to perform join on table
> -------------------------------
>
>                 Key: HIVE-7186
>                 URL: https://issues.apache.org/jira/browse/HIVE-7186
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>         Environment: Hortonworks Data Platform 2.0.6.0
>            Reporter: Alex Nastetsky
>
> Occasionally, a table will start exhibiting behavior that will prevent it from being
used in a JOIN. 
> When doing a map join, it will just stall at "Starting to launch local task to process
map join; ".
> When doing a regular join, it will make progress but then error out with a IndexOutOfBoundsException:
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException
>         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:365)
>         at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
>         at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
>         at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
>         ... 9 more
> Caused by: java.lang.IndexOutOfBoundsException
>         at java.nio.Buffer.checkIndex(Buffer.java:532)
>         at java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:131)
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153)
>         at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:586)
>         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:372)
>         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:334)
>         ... 15 more
>         
> Doing simple selects against this table work fine and do not show any apparent problems
with the data.
> Assume that the table in question is called tableA and was created by queryA.
> Doing either of the following has helped resolve the issue in the past.
> 1) create table tableB as select * from tableA;
>   Then just use tableB instead in the JOIN.
> 2) regenerate tableA using queryA
>   Then use tableA in the JOIN again. It usually works the second time.
>   
> When doing a "describe formatted" on the tables, the totalSize will be different between
the original tableA and tableB, and sometimes (but not always) between the original tableA
and the regenerated tableA. The numRows will be the same across all versions of the tables.
> This problem can not be reproduced consistently, but the issue always happens when we
try to use an affected table in a JOIN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message