hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandraprakash Bhagtani <cpbhagt...@gmail.com>
Subject Re: hive.optimize.skewjoin problem
Date Fri, 02 Aug 2013 03:48:33 GMT
Thanks Venki.. This is exactly the issue I am facing..



On Fri, Aug 2, 2013 at 4:30 AM, Venki Korukanti
<venki.korukanti@gmail.com>wrote:

> Looks like there is already a JIRA for this:
> https://issues.apache.org/jira/browse/HIVE-4693. It repros in Hive 0.11
> too.
>
>
> On Thu, Aug 1, 2013 at 3:30 AM, Chandraprakash Bhagtani <
> cpbhagtani@gmail.com> wrote:
>
>> I got some clue on this.. Actually I was running a patched hive, so it
>> was eating up the exception. When i reverted the patch, i see the following
>> exception in either case (query1 and query2)
>>
>> ava.io.FileNotFoundException: File
>> hdfs://mycluster/tmp/hive-training/hive_2013-08-01_03-24-07_554_3719658871426124253/-mr-10002/hive_skew_join_bigkeys_0
>> does not exist.
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:410)
>>  at
>> org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:102)
>> at
>> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
>>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>> at
>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
>>  at
>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>  at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>>
>>
>> So it seems that hive_skew_join_bigkeys_0 file is not being created by
>> previous stage. I couldn't locate the source where this file is being
>> created. With query2 even after printing exception it is generating the
>> result. BTW i am running hive 0.10  (cdh4.3)
>>
>> Any clue?
>>
>>
>>
>> On Thu, Aug 1, 2013 at 3:11 PM, Chandraprakash Bhagtani <
>> cpbhagtani@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I was facing a weird issue with hive today. I ran the following 2 queries
>>>
>>> query1:   select co.city from company1 co inner join customer1 cu on
>>> (co.city=cu.city);
>>>
>>> query2:  select distinct co.city from company1 co inner join customer1
>>> cu on (co.city=cu.city);
>>>
>>>
>>> the difference in both these queries is distinct keyword. The first
>>> query is printing the result, but the second query was not printing any
>>> result without showing any error.
>>>
>>> when a disabled skewjoin optimization by setting
>>> "hive.optimize.skewjoin=false", query2 started printing the results too.
>>>
>>> Can anyone explain me what is the issue with skewjoin here?
>>>
>>> --
>>> Thanks & Regards,
>>> Chandra Prakash Bhagtani
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Chandra Prakash Bhagtani
>>
>
>


-- 
Thanks & Regards,
Chandra Prakash Bhagtani

Mime
View raw message