Return-Path: Delivered-To: apmail-hadoop-hive-dev-archive@minotaur.apache.org Received: (qmail 28231 invoked from network); 3 Sep 2010 00:53:55 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Sep 2010 00:53:55 -0000 Received: (qmail 22154 invoked by uid 500); 3 Sep 2010 00:53:55 -0000 Delivered-To: apmail-hadoop-hive-dev-archive@hadoop.apache.org Received: (qmail 22098 invoked by uid 500); 3 Sep 2010 00:53:54 -0000 Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hadoop.apache.org Delivered-To: mailing list hive-dev@hadoop.apache.org Received: (qmail 22090 invoked by uid 99); 3 Sep 2010 00:53:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Sep 2010 00:53:54 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Sep 2010 00:53:53 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o830rXYN009240 for ; Fri, 3 Sep 2010 00:53:33 GMT Message-ID: <14906736.4091283475213409.JavaMail.jira@thor> Date: Thu, 2 Sep 2010 20:53:33 -0400 (EDT) From: "He Yongqiang (JIRA)" To: hive-dev@hadoop.apache.org Subject: [jira] Commented: (HIVE-1610) Using CombinedHiveInputFormat causes partToPartitionInfo IOException In-Reply-To: <13839893.129871283387995571.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-1610?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D12905= 751#action_12905751 ]=20 He Yongqiang commented on HIVE-1610: ------------------------------------ Sammy, the only change in TestHiveFileFormatUtils is to remove URI scheme c= hecks (1 line change).=20 You actually added some lines of code which were removed by HIVE-1510, and = this is the reason the testcase fails.=20 > Using CombinedHiveInputFormat causes partToPartitionInfo IOException =20 > ---------------------------------------------------------------------- > > Key: HIVE-1610 > URL: https://issues.apache.org/jira/browse/HIVE-1610 > Project: Hadoop Hive > Issue Type: Bug > Environment: Hadoop 0.20.2 > Reporter: Sammy Yu > Attachments: 0002-HIVE-1610.-Added-additional-schema-check-to-doG= etPar.patch, 0003-HIVE-1610.patch > > > I have a relatively complicated hive query using CombinedHiveInputFormat: > set hive.exec.dynamic.partition.mode=3Dnonstrict; > set hive.exec.dynamic.partition=3Dtrue;=20 > set hive.exec.max.dynamic.partitions=3D1000; > set hive.exec.max.dynamic.partitions.pernode=3D300; > set hive.input.format=3Dorg.apache.hadoop.hive.ql.io.CombineHiveInputForm= at; > INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) selec= t distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, = keywords.universal_rank, keywords.serp_type, keywords.date_indexed, keyword= s.search_engine_type, keywords.week from keyword_serp_results keywords JOIN= (select domain, keyword, search_engine_type, week, max_date_indexed, min(r= ank) as best_rank from (select keywords1.domain, keywords1.keyword, keyword= s1.search_engine_type, keywords1.week, keywords1.rank, dupkeywords1.max_da= te_indexed from keyword_serp_results keywords1 JOIN (select domain, keyword= , search_engine_type, week, max(date_indexed) as max_date_indexed from keyw= ord_serp_results group by domain,keyword,search_engine_type,week) dupkeywor= ds1 on keywords1.keyword =3D dupkeywords1.keyword AND keywords1.domain =3D= dupkeywords1.domain AND keywords1.search_engine_type =3D dupkeywords1.sear= ch_engine_type AND keywords1.week =3D dupkeywords1.week AND keywords1.date_= indexed =3D dupkeywords1.max_date_indexed) dupkeywords2 group by domain,key= word,search_engine_type,week,max_date_indexed ) dupkeywords3 on keywords.ke= yword =3D dupkeywords3.keyword AND keywords.domain =3D dupkeywords3.domain= AND keywords.search_engine_type =3D dupkeywords3.search_engine_type AND ke= ywords.week =3D dupkeywords3.week AND keywords.date_indexed =3D dupkeywords= 3.max_date_indexed AND keywords.rank =3D dupkeywords3.best_rank; > =20 > This query use to work fine until I updated to r991183 on trunk and start= ed getting this error: > java.io.IOException: cannot find dir =3D hdfs://ec2-75-101-174-245.comput= e-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_14091450259499= 24904/-mr-10002/000000_0 in=20 > partToPartitionInfo: [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8= 020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-1000= 2, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywor= ds/account=3D417/week=3D201035/day=3D20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywor= ds/account=3D418/week=3D201035/day=3D20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywor= ds/account=3D419/week=3D201035/day=3D20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywor= ds/account=3D422/week=3D201035/day=3D20100829, > hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywor= ds/account=3D422/week=3D201035/day=3D20100831] > at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromP= athRecursively(HiveFileFormatUtils.java:277) > at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSp= lit.(CombineHiveInputFormat.java:100) > at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineH= iveInputFormat.java:312) > at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:78= 1) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610) > at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108) > This query works if I don't change the hive.input.format. > set hive.input.format=3Dorg.apache.hadoop.hive.ql.io.CombineHiveInputForm= at; > I've narrowed down this issue to the commit for HIVE-1510. If I take out= the changeset from r987746, everything works as before. --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.