Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hive-dev@hadoop.apache.org
Message-ID: <14906736.4091283475213409.JavaMail.jira@thor>
Date: Thu, 2 Sep 2010 20:53:33 -0400 (EDT)
From: "He Yongqiang (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Subject: [jira] Commented: (HIVE-1610) Using CombinedHiveInputFormat causes
 partToPartitionInfo IOException
In-Reply-To: <13839893.129871283387995571.JavaMail.jira@thor>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HIVE-1610?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D12905=
751#action_12905751 ]=20

He Yongqiang commented on HIVE-1610:
------------------------------------

Sammy, the only change in TestHiveFileFormatUtils is to remove URI scheme c=
hecks (1 line change).=20
You actually added some lines of code which were removed by HIVE-1510, and =
this is the reason the testcase fails.=20

> Using CombinedHiveInputFormat causes partToPartitionInfo IOException =20
> ----------------------------------------------------------------------
>
>                 Key: HIVE-1610
>                 URL: https://issues.apache.org/jira/browse/HIVE-1610
>             Project: Hadoop Hive
>          Issue Type: Bug
>         Environment: Hadoop 0.20.2
>            Reporter: Sammy Yu
>         Attachments: 0002-HIVE-1610.-Added-additional-schema-check-to-doG=
etPar.patch, 0003-HIVE-1610.patch
>
>
> I have a relatively complicated hive query using CombinedHiveInputFormat:
> set hive.exec.dynamic.partition.mode=3Dnonstrict;
> set hive.exec.dynamic.partition=3Dtrue;=20
> set hive.exec.max.dynamic.partitions=3D1000;
> set hive.exec.max.dynamic.partitions.pernode=3D300;
> set hive.input.format=3Dorg.apache.hadoop.hive.ql.io.CombineHiveInputForm=
at;
> INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) selec=
t distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, =
keywords.universal_rank, keywords.serp_type, keywords.date_indexed, keyword=
s.search_engine_type, keywords.week from keyword_serp_results keywords JOIN=
 (select domain, keyword, search_engine_type, week, max_date_indexed, min(r=
ank) as best_rank from (select keywords1.domain, keywords1.keyword, keyword=
s1.search_engine_type,  keywords1.week, keywords1.rank, dupkeywords1.max_da=
te_indexed from keyword_serp_results keywords1 JOIN (select domain, keyword=
, search_engine_type, week, max(date_indexed) as max_date_indexed from keyw=
ord_serp_results group by domain,keyword,search_engine_type,week) dupkeywor=
ds1 on keywords1.keyword =3D dupkeywords1.keyword AND  keywords1.domain =3D=
 dupkeywords1.domain AND keywords1.search_engine_type =3D dupkeywords1.sear=
ch_engine_type AND keywords1.week =3D dupkeywords1.week AND keywords1.date_=
indexed =3D dupkeywords1.max_date_indexed) dupkeywords2 group by domain,key=
word,search_engine_type,week,max_date_indexed ) dupkeywords3 on keywords.ke=
yword =3D dupkeywords3.keyword AND  keywords.domain =3D dupkeywords3.domain=
 AND keywords.search_engine_type =3D dupkeywords3.search_engine_type AND ke=
ywords.week =3D dupkeywords3.week AND keywords.date_indexed =3D dupkeywords=
3.max_date_indexed AND keywords.rank =3D dupkeywords3.best_rank;
> =20
> This query use to work fine until I updated to r991183 on trunk and start=
ed getting this error:
> java.io.IOException: cannot find dir =3D hdfs://ec2-75-101-174-245.comput=
e-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_14091450259499=
24904/-mr-10002/000000_0 in=20
> partToPartitionInfo: [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8=
020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-1000=
2,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywor=
ds/account=3D417/week=3D201035/day=3D20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywor=
ds/account=3D418/week=3D201035/day=3D20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywor=
ds/account=3D419/week=3D201035/day=3D20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywor=
ds/account=3D422/week=3D201035/day=3D20100829,
> hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywor=
ds/account=3D422/week=3D201035/day=3D20100831]
> at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromP=
athRecursively(HiveFileFormatUtils.java:277)
> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSp=
lit.<init>(CombineHiveInputFormat.java:100)
> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineH=
iveInputFormat.java:312)
> at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:78=
1)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> This query works if I don't change the hive.input.format.
> set hive.input.format=3Dorg.apache.hadoop.hive.ql.io.CombineHiveInputForm=
at;
> I've narrowed down this issue to the commit for HIVE-1510.  If I take out=
 the changeset from r987746, everything works as before.

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.