hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <>
Subject [jira] Commented: (HIVE-1610) Using CombinedHiveInputFormat causes partToPartitionInfo IOException
Date Fri, 03 Sep 2010 00:53:33 GMT


He Yongqiang commented on HIVE-1610:

Sammy, the only change in TestHiveFileFormatUtils is to remove URI scheme checks (1 line change).

You actually added some lines of code which were removed by HIVE-1510, and this is the reason
the testcase fails. 

> Using CombinedHiveInputFormat causes partToPartitionInfo IOException  
> ----------------------------------------------------------------------
>                 Key: HIVE-1610
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Bug
>         Environment: Hadoop 0.20.2
>            Reporter: Sammy Yu
>         Attachments: 0002-HIVE-1610.-Added-additional-schema-check-to-doGetPar.patch,
> I have a relatively complicated hive query using CombinedHiveInputFormat:
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true; 
> set hive.exec.max.dynamic.partitions=1000;
> set hive.exec.max.dynamic.partitions.pernode=300;
> set;
> INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select distinct keywords.keyword,
keywords.domain, keywords.url, keywords.rank, keywords.universal_rank, keywords.serp_type,
keywords.date_indexed, keywords.search_engine_type, keywords.week from keyword_serp_results
keywords JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, min(rank)
as best_rank from (select keywords1.domain, keywords1.keyword, keywords1.search_engine_type,
 keywords1.week, keywords1.rank, dupkeywords1.max_date_indexed from keyword_serp_results keywords1
JOIN (select domain, keyword, search_engine_type, week, max(date_indexed) as max_date_indexed
from keyword_serp_results group by domain,keyword,search_engine_type,week) dupkeywords1 on
keywords1.keyword = dupkeywords1.keyword AND  keywords1.domain = dupkeywords1.domain AND keywords1.search_engine_type
= dupkeywords1.search_engine_type AND keywords1.week = dupkeywords1.week AND keywords1.date_indexed
= dupkeywords1.max_date_indexed) dupkeywords2 group by domain,keyword,search_engine_type,week,max_date_indexed
) dupkeywords3 on keywords.keyword = dupkeywords3.keyword AND  keywords.domain = dupkeywords3.domain
AND keywords.search_engine_type = dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week
AND keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = dupkeywords3.best_rank;
> This query use to work fine until I updated to r991183 on trunk and started getting this
> cannot find dir = hdfs://
> partToPartitionInfo: [hdfs://,
> hdfs://,
> hdfs://,
> hdfs://,
> hdfs://,
> hdfs://]
> at
> at$CombineHiveInputSplit.<init>(
> at
> at org.apache.hadoop.mapred.JobClient.writeOldSplits(
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(
> at org.apache.hadoop.mapred.JobClient.submitJob(
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(
> This query works if I don't change the hive.input.format.
> set;
> I've narrowed down this issue to the commit for HIVE-1510.  If I take out the changeset
from r987746, everything works as before.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message