Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF93FE3CA for ; Fri, 18 Jan 2013 10:46:14 +0000 (UTC) Received: (qmail 4020 invoked by uid 500); 18 Jan 2013 10:46:14 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 3794 invoked by uid 500); 18 Jan 2013 10:46:14 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 3754 invoked by uid 500); 18 Jan 2013 10:46:13 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 3747 invoked by uid 99); 18 Jan 2013 10:46:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jan 2013 10:46:13 +0000 Date: Fri, 18 Jan 2013 10:46:13 +0000 (UTC) From: "Namit Jain (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-3699?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13557= 115#comment-13557115 ]=20 Namit Jain commented on HIVE-3699: ---------------------------------- +1 We had so many checked in tests with wrong results. =20 > Multiple insert overwrite into multiple tables query stores same results = in all tables > -------------------------------------------------------------------------= ------------- > > Key: HIVE-3699 > URL: https://issues.apache.org/jira/browse/HIVE-3699 > Project: Hive > Issue Type: Bug > Components: Query Processor > Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): h= ive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch > Reporter: Alexandre Fouch=C3=A9 > Assignee: Navis > Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, HI= VE-3699.D7743.3.patch, HIVE-3699_hive-0.9.1.patch.txt > > > (Note: This might be related to HIVE-2750) > I am doing a query with multiple INSERT OVERWRITE to multiple tables in o= rder to scan the dataset only 1 time, and i end up having all these tables = with the same content ! It seems the GROUP BY query that returns results is= overwriting all the temp tables. > Weird enough, if i had further GROUP BY queries into additional temp tabl= es, grouped by a different field, then all temp tables, even the ones that = would have been wrong content are all correctly populated. > This is the misbehaving query: > FROM nikon > INSERT OVERWRITE TABLE e1 > SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions > WHERE qs_cs_s_cat=3D'PRINT' GROUP BY qs_cs_s_aid > INSERT OVERWRITE TABLE e2 > SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues > WHERE qs_cs_s_cat=3D'VIEW' GROUP BY qs_cs_s_aid > ; > It launches only one MR job and here are the results. Why does table 'e1'= contains results from table 'e2' ?! Table 'e1' should have been empty (see= individual SELECTs further below) > hive> SELECT * from e1; > OK > NULL 2 > 1627575 25 > 1627576 70 > 1690950 22 > 1690952 42 > 1696705 199 > 1696706 66 > 1696730 229 > 1696759 85 > 1696893 218 > Time taken: 0.229 seconds > hive> SELECT * from e2; > OK > NULL 2 > 1627575 25 > 1627576 70 > 1690950 22 > 1690952 42 > 1696705 199 > 1696706 66 > 1696730 229 > 1696759 85 > 1696893 218 > Time taken: 0.11 seconds > Here is are the result to the indiviual queries (only the second query re= turns a result set): > hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FRO= M nikon > WHERE qs_cs_s_cat=3D'PRINT' GROUP BY qs_cs_s_aid; > (...) > OK > <- There are no results, this is normal > Time taken: 41.471 seconds > hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon > WHERE qs_cs_s_cat=3D'VIEW' GROUP BY qs_cs_s_aid; > (...) > OK > NULL 2 > 1627575 25 > 1627576 70 > 1690950 22 > 1690952 42 > 1696705 199 > 1696706 66 > 1696730 229 > 1696759 85 > 1696893 218 > Time taken: 39.607 seconds > =20 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira