From issues-return-174479-archive-asf-public=cust-asf.ponee.io@hive.apache.org Tue Dec 10 06:09:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id A4D4D180630 for ; Tue, 10 Dec 2019 07:09:02 +0100 (CET) Received: (qmail 22171 invoked by uid 500); 10 Dec 2019 06:09:02 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 22154 invoked by uid 99); 10 Dec 2019 06:09:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Dec 2019 06:09:02 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C7AD4E0F04 for ; Tue, 10 Dec 2019 06:09:00 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 1A91B7802F8 for ; Tue, 10 Dec 2019 06:09:00 +0000 (UTC) Date: Tue, 10 Dec 2019 06:09:00 +0000 (UTC) From: "Aditya Shah (Jira)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-22561) Data loss on map join for bucketed, partitioned table MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-22561?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Shah updated HIVE-22561: ------------------------------- Attachment: HIVE-22561.branch-3.1.patch Status: Patch Available (was: Open) Submitting patch with the correct name to run with branch-3.1 profile. > Data loss on map join for bucketed, partitioned table > ----------------------------------------------------- > > Key: HIVE-22561 > URL: https://issues.apache.org/jira/browse/HIVE-22561 > Project: Hive > Issue Type: Bug > Affects Versions: 3.1.2 > Reporter: Aditya Shah > Assignee: Aditya Shah > Priority: Blocker > Fix For: 3.1.0, 3.0.0 > > Attachments: HIVE-22561.branch-3.1.patch, HIVE-22561.patch, Scree= nshot 2019-11-28 at 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png > > > A map join on a column (which is neither involved in bucketing and partit= ion) causes data loss.=C2=A0 > Steps to reproduce: > Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.= 2. > Create tables: > =C2=A0 > {code:java} > CREATE TABLE `testj2`( > `id` int,=20 > `bn` string,=20 > `cn` string,=20 > `ad` map,=20 > `mi` array) > PARTITIONED BY (=20 > `br` string) > CLUSTERED BY (=20 > bn)=20 > INTO 2 BUCKETS > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > TBLPROPERTIES ( > 'bucketing_version'=3D'2'); > CREATE TABLE `testj1`( > `id` int,=20 > `can` string,=20 > `cn` string,=20 > `ad` map,=20 > `av` boolean,=20 > `mi` array) > PARTITIONED BY (=20 > `brand` string) > CLUSTERED BY (=20 > can)=20 > INTO 2 BUCKETS > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > TBLPROPERTIES ( > 'bucketing_version'=3D'2'); > {code} > insert some data in both: > {code:java} > insert into testj1 values (100, 'mes_1', 'customer_1', map('city1', 5600= 77), false, array(5, 10), 'brand_1'), > (101, 'mes_2', 'customer_2', map('city2', 560078), true, array(10, 20), = 'brand_2'), > (102, 'mes_3', 'customer_3', map('city3', 560079), false, array(15, 30),= 'brand_3'), > (103, 'mes_4', 'customer_4', map('city4', 560080), true, array(20, 40), = 'brand_4'), > (104, 'mes_5', 'customer_5', map('city5', 560081), false, array(25, 50),= 'brand_5'); > insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', = 560076),array(0, 0, 0), 'tv'), > (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv')= , > (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv')= , > (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv')= , > (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv= '); > {code} > Do a join between them: > {code:java} > select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj= 2 t2 on (t1.id =3D t2.id) order by t1.id; > {code} > Observed results: > !image-2019-11-28-20-46-25-432.png|width=3D524,height=3D100! > In the plan, I can see a map join. Disabling it gives the correct result. > =C2=A0 > =C2=A0 -- This message was sent by Atlassian Jira (v8.3.4#803005)