Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B763B17841 for ; Thu, 22 Jan 2015 15:54:30 +0000 (UTC) Received: (qmail 69479 invoked by uid 500); 22 Jan 2015 15:54:28 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 69407 invoked by uid 500); 22 Jan 2015 15:54:28 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 69397 invoked by uid 99); 22 Jan 2015 15:54:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2015 15:54:28 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of xmu.wc.2007@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-ig0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2015 15:54:02 +0000 Received: by mail-ig0-f172.google.com with SMTP id l13so28372878iga.5 for ; Thu, 22 Jan 2015 07:52:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=bB7dP1y6obQZZ+nc/n+8cVkWKg/P/DDriATU5JWUiHA=; b=d/feVoANn4B9X2BGS9LOrNKSJQWhxU3alwdE5yxz0r2iaRnZ9Nxl71yn50J5FD632O vQHvCyY6cgWu3OWBWmuK5tZF/7RAeGgvt7FoMWU2otsm4Y+HTBXJMYIb7gjEupP7FyeA awZ2IBm42Rr1CflhpQAvV/8hjctLbXd3eQ1Iugvz/X8naxggwcySF2RZdcGOzFwOJ7OV 8Yj2HrKo/qUU/tt2+nr8jWNLiPzsZBgiVG5Z75O4WvKIwoepOmW+WFOqRB4yFz19KvVa M+nUpnGHsnqAUr+S0jSXTz41s9HudroxPYzPdpw/IDWdgoOAw6UR525zyyVLlB4sTQQp vf3A== MIME-Version: 1.0 X-Received: by 10.50.111.168 with SMTP id ij8mr4605959igb.43.1421941950847; Thu, 22 Jan 2015 07:52:30 -0800 (PST) Received: by 10.36.8.215 with HTTP; Thu, 22 Jan 2015 07:52:30 -0800 (PST) Date: Thu, 22 Jan 2015 23:52:30 +0800 Message-ID: Subject: Re: orc ppd bug report From: Ch Wan To: user@hive.apache.org, pjayachandran@hortonworks.com Content-Type: multipart/alternative; boundary=047d7b414534df476e050d3fa8c3 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b414534df476e050d3fa8c3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi. We use hive0.13.1 currently and encountered this bug. I tried the test case in trunk and got the correct result. But with some more complex sql, the result is still incorrect. I found the patch HIVE-8707 makes the simple test case work well, but it doesn't fix this bug at all. 2015-01-07 13:27 GMT+08:00 wzc : > I tried on hive trunk last time and it didn't work.i'll try again. > Thank you for your help. > > On 2015=E5=B9=B41=E6=9C=887=E6=97=A5 =E5=91=A8=E4=B8=89 at 02:29 Prasanth= Jayachandran < > pjayachandran@hortonworks.com> wrote: > >> Hi >> >> Which version of hive are you using? I tried your test case in hive trun= k >> and it seems to work fine. In both cases where PPD enabled and disabled = I >> am getting 3 as the result. >> >> - Prasanth >> >> >> On Sun, Jan 4, 2015 at 3:04 PM, wzc wrote: >> >>> Recently we find a bug with orc ppd, here is the testcase: >>> >>> use test; >>> create table if not exists test_orc_src (a int, b int, c int) >>> stored as orc; >>> create table if not exists test_orc_src2 (a int, b int, d int) >>> stored as orc; >>> insert overwrite table test_orc_src select 1,2,3 from dim.city >>> limit 1; >>> insert overwrite table test_orc_src2 select 1,2,4 from dim.city >>> limit 1; >>> set hive.auto.convert.join =3D false; >>> select >>> tb.c >>> from test.test_orc_src tb >>> join test.test_orc_src2 tm >>> on tb.a =3D tm.awhere tb.b =3D 2 >>> >>> The correct answer for the above query is 3, while it returns empty.We >>> find that orc ppd use READ_COLUMN_NAMES_CONF_STR property to get the >>> required column list, it's not well constructed when there exists some >>> table whose storage path is prefix of some other table path. This bug >>> is relate to HIVE-1903 >>> , IN HiveInputForm= at#pushProjectionsAndFilters >>> it use prefix match for to get all alias associated with the given path= , >>> which I think is not very suitable. I dont know why we shall do prefix >>> match here instead of equal match. >>> Any help is appreciated. >>> >>> >>> >>> >>> >>> >>> >>> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidentia= l, >> privileged and exempt from disclosure under applicable law. If the reade= r >> of this message is not the intended recipient, you are hereby notified t= hat >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediat= ely >> and delete it from your system. Thank You. > > --047d7b414534df476e050d3fa8c3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi.=C2=A0We use hive0.13.1 currently and encountered this = bug.

I tried the test case in trunk and got the correct = result. But with some more complex sql, the result is still incorrect.

I found the patch=C2=A0HIVE-8707=C2=A0makes the simple test case = work well, but it doesn't fix this bug at all.

2015-01-07 13:27 GMT+08:00 wzc <wzc19= 89@gmail.com>:
I tried on hive trunk last= time and it didn't work.i'll try again.
Thank you for your help= .

On 2015= =E5=B9=B41=E6=9C=887=E6=97=A5 =E5=91=A8=E4=B8=89 at 02:29 Prasanth Jayachan= dran <pjayachandran@hortonworks.com> wrote:
Hi=C2=A0

Which version of hive are you using? I tried your test case in hive tr= unk and it seems to work fine. In both cases where PPD enabled and disabled= I am getting 3 as the result.

- Prasanth


On Sun, Jan 4, 2015 at 3:04 PM, wzc <= span dir=3D"ltr"><wzc1989@gmail.com> wrote:

Recently we find a bug with orc ppd, =C2=A0here is the testcase:


use test;
create table if not exists test_orc_src (a int, b int, c int)
stored as orc;
create table if not exists test_orc_src2 (a int, b int, d int)
stored as orc;
insert overwrite table test_orc_src select 1,2,3 from dim.city
limit 1;
insert overwrite table test_orc_src2 select 1,2,4 from dim.city
limit 1;
set hive.auto.convert.join =3D false;
select
=C2=A0 tb.c
from test.test_orc_src tb
join test.test_orc_src2 tm
on tb.a =3D tm.awhere tb.b =3D 2

The correct answer for the above query is 3, while it returns empty.We= find that orc ppd use=C2=A0READ_COLUMN_NAME= S_CONF_STR property to get the required column list, it's not well cons= tructed when there exists some table whose=C2=A0storage path is pref= ix of some other table path. This bug is relate to HIVE-1903=C2= =A0, IN=C2=A0HiveInputFormat#pushProjections= AndFilters it use prefix match for to get all alias associated with the giv= en path, which I think is not very suitable.=C2=A0 I dont know why we shall= do prefix match here instead of equal match.
Any help is appreciated.=C2=A0
=09 =09 =09

=C2=A0

=09 =09 =09

=C2=A0

=09 =09 =09




CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u.

--047d7b414534df476e050d3fa8c3--