hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-16761) LLAP IO: SMB joins fail elevator
Date Mon, 19 Jun 2017 21:19:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054749#comment-16054749
] 

Sergey Shelukhin edited comment on HIVE-16761 at 6/19/17 9:18 PM:
------------------------------------------------------------------

After fixing HIVE-16915, the error changes to 
{noformat}
java.lang.RuntimeException: java.io.IOException: java.io.IOException: java.io.IOException:
cannot find dir = hdfs://.../apps/hive/warehouse/customer_accounts_orc_200/000048_0 in pathToPartitionInfo:
[hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2016/quarter=3, hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2016/quarter=4,
hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2017/quarter=2, hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2017/quarter=3]
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:145)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:157)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.<init>(MRReaderMapred.java:76)
	at org.apache.tez.mapreduce.input.MultiMRInput.initFromEvent(MultiMRInput.java:196)
	at org.apache.tez.mapreduce.input.MultiMRInput.handleEvents(MultiMRInput.java:154)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:715)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:105)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:792)
	at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35)
	at java.lang.Thread.run(Thread.java:745)
...
Caused by: java.io.IOException: cannot find dir = hdfs://.../apps/hive/warehouse/customer_accounts_orc_200/000048_0
in pathToPartitionInfo: [hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2016/quarter=3,
hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2016/quarter=4, hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2017/quarter=2,
hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2017/quarter=3]
	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:391)
	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:357)
	at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.getPartitionValues(VectorizedRowBatchCtx.java:153)
	at org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.<init>(LlapRecordReader.java:139)
	at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat.getRecordReader(LlapInputFormat.java:114)
	... 13 more
{noformat}


was (Author: sershe):
After fixing HIVE-16915, the error changes to 
{noformat}
java.lang.RuntimeException: java.io.IOException: java.io.IOException: java.io.IOException:
cannot find dir = hdfs://.../apps/hive/warehouse/customer_accounts_orc_200/000048_0 in pathToPartitionInfo:
[hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2016/quarter=3, hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2016/quarter=4,
hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2017/quarter=2, hdfs://.../apps/hive/warehouse/transactions_raw_orc_200/year=2017/quarter=3]
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:145)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:157)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.<init>(MRReaderMapred.java:76)
	at org.apache.tez.mapreduce.input.MultiMRInput.initFromEvent(MultiMRInput.java:196)
	at org.apache.tez.mapreduce.input.MultiMRInput.handleEvents(MultiMRInput.java:154)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:715)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:105)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:792)
	at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35)
	at java.lang.Thread.run(Thread.java:745)
{noformat}

> LLAP IO: SMB joins fail elevator 
> ---------------------------------
>
>                 Key: HIVE-16761
>                 URL: https://issues.apache.org/jira/browse/HIVE-16761
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gopal V
>            Assignee: Sergey Shelukhin
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
> 	at org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
> 	at org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
> 	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
> 	... 26 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
> 	at org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
> 	at org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
> 	at org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
> 	... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join customer_accounts_orc_200
b on a.account_id=b.account_id group by year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message