Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A7C89598 for ; Fri, 4 Nov 2011 22:08:16 +0000 (UTC) Received: (qmail 73890 invoked by uid 500); 4 Nov 2011 22:08:15 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 73851 invoked by uid 500); 4 Nov 2011 22:08:14 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 73818 invoked by uid 500); 4 Nov 2011 22:08:14 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 73803 invoked by uid 99); 4 Nov 2011 22:08:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Nov 2011 22:08:14 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Nov 2011 22:08:12 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id A461D30C980 for ; Fri, 4 Nov 2011 22:07:51 +0000 (UTC) Date: Fri, 4 Nov 2011 22:07:51 +0000 (UTC) From: "Carl Steinbach (Updated) (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <914819160.1433.1320444471674.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1262190796.262.1319169393661.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HIVE-2520) left semi join will duplicate data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-2520?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2520: --------------------------------- Fix Version/s: (was: 0.8.0) =20 > left semi join will duplicate data > ---------------------------------- > > Key: HIVE-2520 > URL: https://issues.apache.org/jira/browse/HIVE-2520 > Project: Hive > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: binlijin > Assignee: binlijin > Priority: Critical > Labels: patch > Attachments: hive-2520.2.patch, hive-2520.patch > > > CREATE TABLE sales (name STRING, id INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; > CREATE TABLE things (id INT, name STRING) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; > The 'sales' table has data in a file: sales.txt, and the data is=EF=BC=9A > Joe 2 > Hank 2 > The 'things' table has data int two files: things.txt and things2.txt=EF= =BC=9A > The content of things.txt is : > 2 Tie > The content of things2.txt is : > 2 Tie > SELECT * FROM sales LEFT SEMI JOIN things ON (sales.id =3D things.id); > will output=EF=BC=9A > Joe 2 > Joe 2 > Hank 2 > Hank 2 > so the result is wrong. > In CommonJoinOperator left semi join should use " genObject(null, 0, new = IntermediateObject(new ArrayList[numAliases], 0), true); " to generate data= . > but now it uses " genUniqueJoinObject(0, 0); " to generate data. > This patch will solve this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira