Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A820BEBA1 for ; Sat, 26 Jan 2013 15:28:18 +0000 (UTC) Received: (qmail 76452 invoked by uid 500); 26 Jan 2013 15:28:17 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 76331 invoked by uid 500); 26 Jan 2013 15:28:16 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 76316 invoked by uid 99); 26 Jan 2013 15:28:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Jan 2013 15:28:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of john@omernik.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vc0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Jan 2013 15:28:11 +0000 Received: by mail-vc0-f176.google.com with SMTP id fy27so972674vcb.7 for ; Sat, 26 Jan 2013 07:27:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=1ogv13bBpIff0+MyFZm9iO5zc3CGVqY4NzBSr+kjphw=; b=aYujWIhWrLGmMuqv3AjRXBpRwuKOd4yQjPAQmRfagPUcDDi7AtGPCLSJEngY7Zkm/c YCqa15efb7XdURgfZ1qTjziL8QVui8Hi0KjEITLgyF8iMEXBDz2V12e8wQIwMK76KRpX uUq9C0fds1LJCxMUEVdSNW59O1wdq1feud/ilzqN68QztVO7BWn+TCwmj5IeZrtaUAwC 2XUBKgRj8R0ahBN6+DQvctLabVHwhWXg/U5Fz1UTgVrR2D9S8AUM0/8E455/rMQO6lhD rR6Dys+op4GGEMU9009GxUZIDnZjSvmf1I1YHXJdWSBrkaT0AM/xQ9v80+YF/1gi4TOk aN0g== X-Received: by 10.220.151.9 with SMTP id a9mr9695615vcw.7.1359214070810; Sat, 26 Jan 2013 07:27:50 -0800 (PST) MIME-Version: 1.0 Received: by 10.58.7.198 with HTTP; Sat, 26 Jan 2013 07:27:30 -0800 (PST) In-Reply-To: References: From: John Omernik Date: Sat, 26 Jan 2013 09:27:30 -0600 Message-ID: Subject: Re: Is this a known Bug: Multi Inserts from partitioned source ignore Where Clauses To: user@hive.apache.org Content-Type: multipart/alternative; boundary=f46d043085c8dd495c04d432af47 X-Gm-Message-State: ALoCoQl7mXaGYOBmClJfU3lovLJNm+Gt1nGdcwp7R3fc1i8c2v3XTtcn8Tui4R0GReKAkdCqpYqR X-Virus-Checked: Checked by ClamAV on apache.org --f46d043085c8dd495c04d432af47 Content-Type: text/plain; charset=ISO-8859-1 I am not a code expert, this looks very much like the bug I posted, but my bug is not using INSERT OVERWRITE (just INSERT INTO) and I am not doing any group by (probably not an issue) Just to be clear, this is probably the same issue as mine, but if someone with more knowledge of the underlying structures were to see the OVERWRITE vs INTO they may see something different. On Sat, Jan 26, 2013 at 9:20 AM, Philip Tromans wrote: > This is a known (recently fixed) bug: > > https://issues.apache.org/jira/browse/HIVE-3699 > > Phil. > > > On 26 January 2013 15:17, John Omernik wrote: > >> I ran into an interesting bug. Basically, if your FROM() source is >> a partitioned table and you use a where clause that prunes, all of the >> INSERT HERE SELECT * WHERE x=y ignores each specified where clause. This >> does not occur if the source partition is not specified, but if the source >> as where partition = 'x' then the where on each individual insert is >> ignored... >> >> I've included some files here >> >> testdata.tsv - Tab delimited data to prove the issue >> create_tables.hive - Creates a database and tables as well as loads the >> data from the TSV >> >> Test Cases: >> I created these test case files in a way that there are three types of >> insert in each case: 1. Load all data from initial statement, 2. Load >> partial data (use a limiting clause such as where day >= '2013-01-05', and >> 3 Load NO data from the initial statement (where 1 = 0) >> >> These tests are all run on hive 0.9 >> >> multi-flat-flat.hive - The source table and the dest tables are not >> partitioned, the where clauses work as expected: >> >> 19 Rows loaded to multi_bug_flat >> 0 Rows loaded to multi_bug_flat3 >> 15 Rows loaded to multi_bug_flat2 >> >> multi-part-part.hive - The source table and the dest tables are >> partitioned. The where clauses are not honored. >> >> 9 Rows loaded to multi_bug_part3 >> 9 Rows loaded to multi_bug_part2 >> 9 Rows loaded to multi_bug_part >> >> multi-flat-part.hive - The source table is flat, the dest table is >> partitioned - The where clauses work as expected: >> >> 0 Rows loaded to multi_bug_part3 >> 15 Rows loaded to multi_bug_part2 >> 19 Rows loaded to multi_bug_part >> >> multi-part-flat.hive - The source table is partitioned, the dest table is >> flat - The where clauses are not honored: >> >> 9 Rows loaded to multi_bug_flat >> 9 Rows loaded to multi_bug_flat3 >> 9 Rows loaded to multi_bug_flat2 >> >> multi-part-specified.hive - The source and dest are partitioned, but >> there is no partition pruning statement in the from () this works as >> expected >> >> 0 Rows loaded to multi_bug_part3 >> 15 Rows loaded to multi_bug_part2 >> 19 Rows loaded to multi_bug_part >> >> >> Thoughts? >> > > --f46d043085c8dd495c04d432af47 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I am not a code expert, this looks very much like the bug I posted, but my = bug is not using INSERT OVERWRITE (just INSERT INTO) and I am not doing any= group by (probably not an issue)

Just to be clear, this= is probably the same issue as mine, but if someone with more knowledge of = the underlying structures were to see the OVERWRITE vs INTO they may see so= mething different.=A0

On Sat, Jan 26, 2013 at 9:20 AM, Philip Trom= ans <philip.j.tromans@gmail.com> wrote:
This is a known (recently fixed) bug:

<= a href=3D"https://issues.apache.org/jira/browse/HIVE-3699" target=3D"_blank= ">https://issues.apache.org/jira/browse/HIVE-3699

Phil.

On 26 January 2013 15:17, John Omernik <jo= hn@omernik.com> wrote:
I ran into an interesting bug. Basically, if your FROM() source is a=A0part= itioned=A0table and you use a where clause that prunes, all of the INSERT H= ERE SELECT * WHERE x=3Dy ignores each specified where clause. =A0This does = not occur if the source partition is not specified, but if the source as wh= ere partition =3D 'x' then the where on each individual insert is i= gnored...=A0

I've included some files here

t= estdata.tsv - Tab delimited data to prove the issue
create_tables= .hive - Creates a database and tables as well as loads the data from the TS= V

Test Cases:
I created these test case files i= n a way that there are three types of insert in each case: 1. Load all data= from initial statement, 2. Load partial data (use a limiting clause such a= s where day >=3D '2013-01-05', and 3 Load NO data from the initi= al statement (where 1 =3D 0)

These tests are all run on hive 0.9

multi-flat-flat.hive - The source table and the dest tables are not p= artitioned, the where clauses work as expected:

19 Rows loaded to multi_bug_flat
0 Rows loaded to multi_bug_= flat3
15 Rows loaded to multi_bug_flat2

multi-part-part.hive - The source table and the dest tables are parti= tioned. The where clauses are not honored.=A0

9 Rows loaded to multi_bug_part3
9 Rows = loaded to multi_bug_part2
9 Rows loaded to multi_bug_part

multi-flat-part.hive - The source table is flat, th= e dest table is partitioned - The where clauses work as expected:

0 Rows loaded to multi_bug_part3
15 Rows= loaded to multi_bug_part2
19 Rows loaded to multi_bug_part
=

multi-part-flat.hive - The source table is partit= ioned, the dest table is flat - The where clauses are not honored:

9 Rows loaded to multi_bug_flat
9 Rows l= oaded to multi_bug_flat3
9 Rows loaded to multi_bug_flat2

multi-part-specified.hive - The source and dest are= =A0partitioned, but there is no partition pruning statement in the from () = =A0this works as expected

0 Rows loaded to multi_bug_part3
15 Rows= loaded to multi_bug_part2
19 Rows loaded to multi_bug_part
=


Thoughts?


--f46d043085c8dd495c04d432af47--