From issues-return-174479-archive-asf-public=cust-asf.ponee.io@hive.apache.org  Tue Dec 10 06:09:02 2019
Return-Path: <issues-return-174479-archive-asf-public=cust-asf.ponee.io@hive.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id A4D4D180630
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 10 Dec 2019 07:09:02 +0100 (CET)
Received: (qmail 22171 invoked by uid 500); 10 Dec 2019 06:09:02 -0000
Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@hive.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@hive.apache.org>
List-Post: <mailto:issues@hive.apache.org>
List-Id: <issues.hive.apache.org>
Reply-To: dev@hive.apache.org
Delivered-To: mailing list issues@hive.apache.org
Received: (qmail 22154 invoked by uid 99); 10 Dec 2019 06:09:02 -0000
Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Dec 2019 06:09:02 +0000
Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C7AD4E0F04
	for <issues@hive.apache.org>; Tue, 10 Dec 2019 06:09:00 +0000 (UTC)
Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1])
	by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 1A91B7802F8
	for <issues@hive.apache.org>; Tue, 10 Dec 2019 06:09:00 +0000 (UTC)
Date: Tue, 10 Dec 2019 06:09:00 +0000 (UTC)
From: "Aditya Shah (Jira)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.13271247.1574954438000.290712.1575958140041@Atlassian.JIRA>
In-Reply-To: <JIRA.13271247.1574954438000@Atlassian.JIRA>
References: <JIRA.13271247.1574954438000@Atlassian.JIRA> <JIRA.13271247.1574954438629@jira-he-de>
Subject: [jira] [Updated] (HIVE-22561) Data loss on map join for bucketed,
 partitioned table
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


     [ https://issues.apache.org/jira/browse/HIVE-22561?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aditya Shah updated HIVE-22561:
-------------------------------
    Attachment: HIVE-22561.branch-3.1.patch
        Status: Patch Available  (was: Open)

Submitting patch with the correct name to run with branch-3.1 profile.

> Data loss on map join for bucketed, partitioned table
> -----------------------------------------------------
>
>                 Key: HIVE-22561
>                 URL: https://issues.apache.org/jira/browse/HIVE-22561
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.2
>            Reporter: Aditya Shah
>            Assignee: Aditya Shah
>            Priority: Blocker
>             Fix For: 3.1.0, 3.0.0
>
>         Attachments: HIVE-22561.branch-3.1.patch, HIVE-22561.patch, Scree=
nshot 2019-11-28 at 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partit=
ion) causes data loss.=C2=A0
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.=
2.
> Create tables:
> =C2=A0
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int,=20
>   `bn` string,=20
>   `cn` string,=20
>   `ad` map<string,int>,=20
>   `mi` array<int>)
> PARTITIONED BY (=20
>   `br` string)
> CLUSTERED BY (=20
>   bn)=20
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'=3D'2');
> CREATE TABLE `testj1`(
>   `id` int,=20
>   `can` string,=20
>   `cn` string,=20
>   `ad` map<string,int>,=20
>   `av` boolean,=20
>   `mi` array<int>)
> PARTITIONED BY (=20
>   `brand` string)
> CLUSTERED BY (=20
>   can)=20
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'=3D'2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 5600=
77), false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), =
'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30),=
 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), =
'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50),=
 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', =
560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv')=
,
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv')=
,
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv')=
,
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv=
');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj=
2 t2 on (t1.id =3D t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=3D524,height=3D100!
> In the plan, I can see a map join. Disabling it gives the correct result.
> =C2=A0
> =C2=A0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)