hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rich Haase (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HIVE-11132) Queries using join and group by produce incorrect output when hive.auto.convert.join=false and hive.optimize.reducededuplication=true
Date Tue, 07 Jul 2015 13:23:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rich Haase resolved HIVE-11132.
-------------------------------
    Resolution: Won't Fix
      Assignee: Rich Haase

The interaction between these two parameters is undesirable, but rare enough that it's probably
not worth the effort of fixing.  This JIRA can serve as documentation of the problem for anyone
who encounters it in future.

> Queries using join and group by produce incorrect output when hive.auto.convert.join=false
and hive.optimize.reducededuplication=true
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11132
>                 URL: https://issues.apache.org/jira/browse/HIVE-11132
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Rich Haase
>            Assignee: Rich Haase
>
> Queries using join and group by produce multiple output rows with the same key when hive.auto.convert.join=false
and hive.optimize.reducededuplication=true.  This interaction between configuration parameters
is unexpected and should be well documented at the very least and should likely be considered
a bug.
> e.g. 
> hive> set hive.auto.convert.join = false;
> hive> set hive.optimize.reducededuplication = true;
> hive> SELECT foo.id, count(*) as factor
>     > FROM foo
>     > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
>     > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
>     > JOIN forecast ON (foo.id = forecast.id AND foo.line_id = forecast.line_id)
>     > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
>     > GROUP BY foo.id;
> XYZ         79
> XYZ		74
> XYZ		297
> XYZ		66
> hive> set hive.auto.convert.join = true;
> hive> set hive.optimize.reducededuplication = true;
> hive> SELECT foo.id, count(*) as factor
>     > FROM foo
>     > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
>     > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
>     > JOIN forecast ON (foo.id = forecast.id AND foo.line_id = forecast.line_id)
>     > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
>     > GROUP BY foo.id;
> XYZ         516



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message