hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gaozhan ding (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-23667) Incorrect output with option hive.auto.convert.join=fasle
Date Mon, 19 Oct 2020 12:09:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-23667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

gaozhan ding updated HIVE-23667:
--------------------------------
    Description: 
We use hive with version 3.1.0 with tez engine 0.9.1.3

I encountered an error when executing a hive SQL. This SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;
set mapred.reduce.tasks=50;
set hive.auto.convert.join=true;
use xxx;

select count(*) from   A  join B on  B.b=A.a;{code}
with the output.
{code:java}
+----------+ | _c0 | +----------+ | 4954736 | +----------+
{code}
But when the hive.auto.convert.join option is set to false,the utput is not as expected。

The SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;  
set mapred.reduce.tasks=50;
set hive.auto.convert.join=false; //changed
use xxx;

select count(*) from A join B on B.b=A.a;{code}
with output:
{code:java}
+------+ | _c0 | +------+ | 0 | +------+
{code}
Beside,both tables participating in the join are partition tables.

Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql output
expected results.

We just upgraded hive from 1.2 to 3.1.0, and we found that these problems only occurred in
the old hive table.

  was:
We use hive with version 3.1.0 with tez engine 0.9.1.3

I encountered an error when executing a hive SQL. This SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;
set mapred.reduce.tasks=50;
set hive.auto.convert.join=true;
use xxx;

select count(*) from   230_dim_site  join dw_fact_inverter_detail on  dw_fact_inverter_detail.site=230_dim_site.id;{code}
with the output.
{code:java}
+----------+ | _c0 | +----------+ | 4954736 | +----------+
{code}
But when the hive.auto.convert.join option is set to false,the utput is not as expected。

The SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;  
set mapred.reduce.tasks=50;
set hive.auto.convert.join=false; //changed
use xxx;

select count(*) from   230_dim_site  join dw_fact_inverter_detail on  dw_fact_inverter_detail.site=230_dim_site.id;{code}
with output:
{code:java}
+------+ | _c0 | +------+ | 0 | +------+
{code}
Beside,both tables participating in the join are partition tables.

Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql output
expected results.

We just upgraded hive from 1.2 to 3.1.0, and we found that these problems only occurred in
the old hive table.


> Incorrect output with option hive.auto.convert.join=fasle
> ---------------------------------------------------------
>
>                 Key: HIVE-23667
>                 URL: https://issues.apache.org/jira/browse/HIVE-23667
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: gaozhan ding
>            Priority: Critical
>
> We use hive with version 3.1.0 with tez engine 0.9.1.3
> I encountered an error when executing a hive SQL. This SQL is as follows
> {code:java}
> set mapreduce.job.queuename=root.xxx;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions.pernode=10000;
> set hive.exec.max.dynamic.partitions=10000;
> set hive.fileformat.check=false;
> set mapred.reduce.tasks=50;
> set hive.auto.convert.join=true;
> use xxx;
> select count(*) from   A  join B on  B.b=A.a;{code}
> with the output.
> {code:java}
> +----------+ | _c0 | +----------+ | 4954736 | +----------+
> {code}
> But when the hive.auto.convert.join option is set to false,the utput is not as expected。
> The SQL is as follows
> {code:java}
> set mapreduce.job.queuename=root.xxx;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions.pernode=10000;
> set hive.exec.max.dynamic.partitions=10000;
> set hive.fileformat.check=false;  
> set mapred.reduce.tasks=50;
> set hive.auto.convert.join=false; //changed
> use xxx;
> select count(*) from A join B on B.b=A.a;{code}
> with output:
> {code:java}
> +------+ | _c0 | +------+ | 0 | +------+
> {code}
> Beside,both tables participating in the join are partition tables.
> Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql output
expected results.
> We just upgraded hive from 1.2 to 3.1.0, and we found that these problems only occurred
in the old hive table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message