hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Song <chen.song...@gmail.com>
Subject Re: hive auto join conversion
Date Thu, 31 Jul 2014 01:19:40 GMT
I tried that and I got the following error.

FAILED: SemanticException [Error 10227]: Not all clauses are supported with
mapjoin hint. Please remove mapjoin hint.

I then tried turning off auto join conversion.

set hive.auto.convert.join=false

But no luck, same error.

Looks like it is a known issue,

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_releasenotes_hdp_2.0/content/ch_relnotes-hdp2.0.0.2-5-2.html

Chen




On Wed, Jul 30, 2014 at 9:10 PM, Navis류승우 <navis.ryu@nexr.com> wrote:

> Could you do it with hive.ignore.mapjoin.hint=false? Mapjoin hint is
> ignored from hive-0.11.0 by default (see
> https://issues.apache.org/jira/browse/HIVE-4042)
>
> Thanks,
> Navis
>
>
> 2014-07-31 10:04 GMT+09:00 Chen Song <chen.song.82@gmail.com>:
>
> I am using cdh5 with hive 0.12. We have some hive jobs migrated from hive
>> 0.10 and they are written like below:
>>
>> select /*+ MAPJOIN(sup) */ c1, c2, sup.c
>> from
>> (
>> select key, c1, c2 from table1
>> union all
>> select key, c1, c2 from table2
>> ) table
>> left outer join
>> sup
>> on (table.c1 = sup.key)
>> distribute by c1
>>
>> In Hive 0.10 (CDH4), Hive translates the left outer join into a map join
>> (map only job), followed by a regular MR job for distribute by.
>>
>> In Hive 0.12 (CDH5), Hive is not able to convert the join into a map
>> join. Instead it launches a common map reduce for the join, followed by
>> another mr for distribute by. However, when I take out the union all
>> operator, Hive seems to be able to create a single MR job, with map join on
>> map phase, and reduce for distribute by.
>>
>> I read a bit on
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
>> and found out that there are some restrictions on map side join starting
>> Hive 0.11. The following are not supported.
>>
>>
>>    - Union Followed by a MapJoin
>>    - Lateral View Followed by a MapJoin
>>    - Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By)
>>    Followed by MapJoin
>>    - MapJoin Followed by Union
>>    - MapJoin Followed by Join
>>    - MapJoin Followed by MapJoin
>>
>>
>> So if one side of the table (big side) is a union of some tables and the
>> other side is a small table, Hive would not be able to do a map join at
>> all? Is that correct?
>>
>> If correct, what should I do to make the job backward compatible?
>>
>> --
>> Chen Song
>>
>>
>


-- 
Chen Song

Mime
View raw message