Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7179CDCCC for ; Tue, 6 Nov 2012 20:40:47 +0000 (UTC) Received: (qmail 84785 invoked by uid 500); 6 Nov 2012 20:40:46 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 84742 invoked by uid 500); 6 Nov 2012 20:40:46 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 84734 invoked by uid 99); 6 Nov 2012 20:40:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Nov 2012 20:40:46 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Nov 2012 20:40:38 +0000 Received: by mail-qc0-f176.google.com with SMTP id n41so639501qco.35 for ; Tue, 06 Nov 2012 12:40:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:message-id:subject:x-mailer:mime-version:content-type :x-gm-message-state; bh=32EjN96Wichn0EDPZIIEdLLBJRP46XAhkkVYX2s59Ec=; b=m2SiIpIWfOxDNrTMrh0OIp7S+x5ly69Nzq1eMSjHUpA0nMT3Jw6oOvAVkbKwOF41D5 /6ot8LuD/mt+c79k2EtlX3dmOCkLeZvUZRjDMQiNp4PXsBlhRTq5lAwoJcRr2bf4SaLV +RHS7o6wMKFlQkTjtDfK3X0JyEDXYlIZGWU4OJqvEIfCRgfDHH+CqTV1Cu23M6ojbDiq gm8sWO+pgwLyeHeTlYYWDbTlxYyud9DtzWl2FdntAMIehXTvR6nxjUbuTgEn4xzpRUDB uMuSDH4d8Y6QhYlhlfSoP4CtnhmKDo203npnb7LJHnjFGqjIgKcJsBijVAyh5329PGHl iMBA== Received: by 10.49.87.163 with SMTP id az3mr4004724qeb.8.1352234416901; Tue, 06 Nov 2012 12:40:16 -0800 (PST) Received: from [192.168.0.13] (sat78-4-82-243-34-156.fbx.proxad.net. [82.243.34.156]) by mx.google.com with ESMTPS id t14sm11145544qef.3.2012.11.06.12.40.14 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 06 Nov 2012 12:40:16 -0800 (PST) Date: Tue, 6 Nov 2012 21:40:12 +0100 From: Alexandre Fouche To: user@hive.apache.org Message-ID: Subject: Multiple insert overwrite into multiple tables query stores same results in all tables X-Mailer: sparrow 1.6.4 (build 1176) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="509975ac_7c83e458_134da" X-Gm-Message-State: ALoCoQlNO6KgHKGfkW6um4IhdV0eBzssNl4WfdVolmN6ogpSoSTuE8CAQYjKUOtJBBpFRhtcqGmW X-Virus-Checked: Checked by ClamAV on apache.org --509975ac_7c83e458_134da Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi, I am doing a query with multiple INSERT OVERWRITE to multiple tables, in order to scan the dataset only 1 time, and i end up having all these tables with the same content ! It seems the GROUP BY query that returns results is overwriting all the tables. This is the misbehaving query: FROM nikon INSERT OVERWRITE TABLE e1 SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid INSERT OVERWRITE TABLE e2 SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid ; It launches only one MR job and here are the results. Why does table 'e1' contains results from table 'e2' ?! Table 'e1' should have been empty (see individual SELECTs further below) hive> SELECT * from e1; OK NULL 2 1627575 25 1627576 70 1690950 22 1690952 42 1696705 199 1696706 66 1696730 229 1696759 85 1696893 218 Time taken: 0.229 seconds hive> SELECT * from e2; OK NULL 2 1627575 25 1627576 70 1690950 22 1690952 42 1696705 199 1696706 66 1696730 229 1696759 85 1696893 218 Time taken: 0.11 seconds Here is are the result to the indiviual queries (only the second query returns a result set): hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM nikon WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid; (...) OK <- There are no results, this is normal Time taken: 41.471 seconds hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid; (...) OK NULL 2 1627575 25 1627576 70 1690950 22 1690952 42 1696705 199 1696706 66 1696730 229 1696759 85 1696893 218 Time taken: 39.607 seconds --509975ac_7c83e458_134da Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline
Hi,

I am doing a query= with multiple INSERT OVERWRITE to multiple tables, in order to scan the = dataset only 1 time, and i end up having all these tables with the same c= ontent =21 It seems the GROUP BY query that returns results is overwritin= g all the tables.

This is the misbehaving query:=

    =46ROM nikon
  &nb= sp; INSERT OVERWRITE TABLE e1
    SELECT qs=5Fcs=5Fs=5F= aid AS Emplacements, COUNT(*) AS Impressions WHERE qs=5Fcs=5Fs=5Fcat=3D'P= RINT' GROUP BY qs=5Fcs=5Fs=5Faid
    INSERT OVERWRITE= TABLE e2
    SELECT qs=5Fcs=5Fs=5Faid AS Emplacement= s, COUNT(*) AS Vues WHERE qs=5Fcs=5Fs=5Fcat=3D'VIEW' GROUP BY qs=5Fcs=5Fs= =5Faid
    ;

It launches onl= y one MR job and here are the results. Why does table 'e1' contains resul= ts from table 'e2' =3F=21 Table 'e1' should have been empty (see individu= al SELECTs further below)

    hive>= SELECT * from e1;
    OK
    NUL= L    2
    1627575 25
  &nbs= p; 1627576 70
    1690950 22
    = 1690952 42
    1696705 199
    16= 96706 66
    1696730 229
    1696= 759 85
    1696893 218
    Time t= aken: 0.229 seconds

    hive> SELEC= T * from e2;
    OK
    NULL &nbs= p;  2
    1627575 25
    162= 7576 70
    1690950 22
    169095= 2 42
    1696705 199
    1696706 = 66
    1696730 229
    1696759 85=
    1696893 218
    Time taken: = 0.11 seconds


Here is are the resu= lt to the indiviual queries (only the second query returns a result set):=

    hive> SELECT qs=5Fcs=5Fs=5Faid= AS Emplacements, COUNT(*) AS Impressions =46ROM nikon WHERE qs=5Fcs=5Fs=5F= cat=3D'PRINT' GROUP BY qs=5Fcs=5Fs=5Faid;
    (...)
    OK
          &l= t;- There are no results, this is normal
    Time tak= en: 41.471 seconds

    hive> SELECT= qs=5Fcs=5Fs=5Faid AS Emplacements, COUNT(*) AS Vues =46ROM nikon WHERE q= s=5Fcs=5Fs=5Fcat=3D'VIEW' GROUP BY qs=5Fcs=5Fs=5Faid;
  &n= bsp; (...)
    OK
    NULL  = 2
    1627575 25
    1627576 70
    1690950 22
    1690952 42
    1696705 199
    1696706 66
<= div>    1696730 229
    1696759 85
    1696893 218
    Time taken: 39.607 se= conds

--509975ac_7c83e458_134da--