hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11132) Queries using join and group by produce incorrect output when hive.auto.convert.join=false and hive.optimize.reducededuplication=true
Date Thu, 12 Nov 2015 02:13:13 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-11132:
------------------------------------

Should this issue be backported to branch-1? It looks like a bug.

> Queries using join and group by produce incorrect output when hive.auto.convert.join=false
and hive.optimize.reducededuplication=true
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11132
>                 URL: https://issues.apache.org/jira/browse/HIVE-11132
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer
>    Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0
>            Reporter: Rich Haase
>            Assignee: Ashutosh Chauhan
>             Fix For: 2.0.0
>
>         Attachments: HIVE-11132.2.patch, HIVE-11132.patch
>
>
> Queries using join and group by produce multiple output rows with the same key when hive.auto.convert.join=false
and hive.optimize.reducededuplication=true.  This interaction between configuration parameters
is unexpected and should be well documented at the very least and should likely be considered
a bug.
> e.g. 
> hive> set hive.auto.convert.join = false;
> hive> set hive.optimize.reducededuplication = true;
> hive> SELECT foo.id, count(*) as factor
>     > FROM foo
>     > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
>     > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
>     > JOIN forecast ON (foo.id = forecast.id AND foo.line_id = forecast.line_id)
>     > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
>     > GROUP BY foo.id;
> XYZ         79
> XYZ		74
> XYZ		297
> XYZ		66
> hive> set hive.auto.convert.join = true;
> hive> set hive.optimize.reducededuplication = true;
> hive> SELECT foo.id, count(*) as factor
>     > FROM foo
>     > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
>     > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
>     > JOIN forecast ON (foo.id = forecast.id AND foo.line_id = forecast.line_id)
>     > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
>     > GROUP BY foo.id;
> XYZ         516



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message