Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 14 Sep 2017 15:38:00 +0000 (UTC)
From: "Dongjoon Hyun (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.13098160.1503981641000.113584.1505403480127@Atlassian.JIRA>
In-Reply-To: <JIRA.13098160.1503981641000@Atlassian.JIRA>
References: <JIRA.13098160.1503981641000@Atlassian.JIRA> <JIRA.13098160.1503981641098@jira-lw-us.apache.org>
Subject: [jira] [Commented] (SPARK-21858) Make Spark grouping_id()
 compatible with Hive grouping__id
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Thu, 14 Sep 2017 15:38:04 -0000


    [ https://issues.apache.org/jira/browse/SPARK-21858?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D161=
66467#comment-16166467 ]=20

Dongjoon Hyun commented on SPARK-21858:
---------------------------------------

Thank you for conclusion, [~cloud_fan]!

> Make Spark grouping_id() compatible with Hive grouping__id
> ----------------------------------------------------------
>
>                 Key: SPARK-21858
>                 URL: https://issues.apache.org/jira/browse/SPARK-21858
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Yann Byron
>
> If you want to migrate some ETLs using `grouping__id` in Hive to Spark an=
d use Spark `grouping_id()` instead of Hive `grouping__id`, you will find d=
ifference between their evaluations.
> Here is an example.
> {code:java}
> select A, B, grouping__id/grouping_id() from t group by A, B grouping set=
s((), (A), (B), (A,B))
> {code}
> Running it on Hive and Spark separately, you'll find this: (the selected =
attribute in selected grouping set is represented by (/) and  otherwise by =
(x))
> ||A B||Binary Expression in Spark||Spark||Hive||Binary Expression in Hive=
||B A||
> |(x) (x)|11|3|0|00|(x) (x)|
> |(x) (/)|10|2|2|10|(/) (x)|
> |(/) (x)|01|1|1|01|(x) (/)|
> |(/) (/)|00|0|3|11|(/) (/)|
> As shown above=EF=BC=8CIn Hive, (/) set to 0, (x) set to 1, and in Spark =
it's opposite.
> Moreover, attributes in `group by` will reverse firstly in Hive. In Spark=
 it'll be evaluated directly.
> In my opinion, I suggest that modifying the behavior of `grouping_id()` m=
ake it compatible with Hive `grouping__id`.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org