Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A7BAE200BFB for ; Wed, 11 Jan 2017 08:21:42 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A60BF160B50; Wed, 11 Jan 2017 07:21:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 59CC5160B2E for ; Wed, 11 Jan 2017 08:21:41 +0100 (CET) Received: (qmail 50010 invoked by uid 500); 11 Jan 2017 07:21:39 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 50000 invoked by uid 99); 11 Jan 2017 07:21:39 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jan 2017 07:21:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 6FA35180028 for ; Wed, 11 Jan 2017 07:21:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -13.82 X-Spam-Level: X-Spam-Status: No, score=-13.82 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, USER_IN_DEF_WHITELIST=-15, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=yahoo-inc.com header.b=BJG7seaE; dkim=pass (1024-bit key) header.d=yahoo-inc.com header.b=prSBOVoz Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id tT2NIiuh0dhd for ; Wed, 11 Jan 2017 07:21:35 +0000 (UTC) Received: from mrout1-b.corp.bf1.yahoo.com (mrout1-b.corp.bf1.yahoo.com [98.139.253.104]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6143D5F3F5 for ; Wed, 11 Jan 2017 07:21:35 +0000 (UTC) Received: from omp1003.mail.ne1.yahoo.com (omp1003.mail.ne1.yahoo.com [98.138.87.3]) by mrout1-b.corp.bf1.yahoo.com (8.15.2/8.15.2/y.out) with ESMTPS id v0B7LV31001487 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 11 Jan 2017 07:21:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1484119292; bh=b7JIwhYElLl7O+okhR7+mqRDRk4+bVrFtV8m88UwWTM=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject; b=BJG7seaEPj7pK6fGyrI95X7iHSEqxgEe67rkMXVPrFKz6GtQNjjFBr+AGb3RhqCiD w8kpwhkwGxmNofpIsg2/D4QdxkfrU9dYmjda64mxFyjXmyyR+/Lqa+UIN967kOp9+8 O7sKpGZarCeZyOSkx3Qk/svCYeqIoJjGjhfdIL6M= Received: (qmail 42930 invoked by uid 1000); 11 Jan 2017 07:21:31 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo-inc.com; s=ginc1024; t=1484119291; bh=xbdus1XrvultKtcnAp04CF8ANU+RSasByeLYMrzpuWg=; h=Date:From:Reply-To:To:Message-ID:In-Reply-To:References:Subject:MIME-Version:Content-Type; b=prSBOVozyC03DTZ9FSWMPZdnt1tb9JbbkPdDPyXlm4fBUPIBK4bXCFoFoKaiJyyrb6n+ZPXZwWut82aJcmtoFZlHwJZKAQc5xSennfERBgwIG6QCgRmHHorMRLZEY609E5Jjtlaa+4WER9aZMlNltuJv39SrjBYOgdmc6cwldkc= X-YMail-OSG: .6sHV0AVM1nCF7ZTwgM8K6bAgGtFZfvquYErt5se2S1i8olCGRq5VO8aXSE8OZM 2RUNJZ.RkYBABh.SyTY478hbj4r0evFMcSqrhwxDKDCmAWZ6icH29M_KOJD1XD8BFXVbF9.KelEB UWp97UYGa8iqejJnFXLLNHZzuimXZdxB4yExyQziy8TNDkiILYtu_Y7RCM5EnN..K_pd39vPNfGa nzb2nDUGcXlSshKhHy5PNJIt_6t6BEm.ZxYkw1yKggZCgBOYxlLTrKE1eBAivGbza Received: from jws200120.mail.ne1.yahoo.com by sendmailws132.mail.ne1.yahoo.com; Wed, 11 Jan 2017 07:21:30 +0000; 1484119290.963 Date: Wed, 11 Jan 2017 07:21:30 +0000 (UTC) From: Chris Drome Reply-To: Chris Drome To: "user@hive.apache.org" , "user@tez.apache.org" Message-ID: <1027506428.866728.1484119290741@mail.yahoo.com> In-Reply-To: References: Subject: Re: tez + union stmt MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_866727_1491894028.1484119290737" archived-at: Wed, 11 Jan 2017 07:21:42 -0000 ------=_Part_866727_1491894028.1484119290737 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Elliot, Mithun already created the following ticket to track the issue: https://issues.apache.org/jira/browse/HIVE-15575 chris =20 On Tuesday, January 10, 2017 11:05 PM, Elliot West w= rote: =20 Thanks Rohini, This is good to know. Could you perhaps raise an issue in the Hive JIRA? Thanks, Elliot. On Tue, 10 Jan 2017 at 22:55, Rohini Palaniswamy = wrote: The implementation in hive does look wrong. The concept of VertexGroups was= added in Tez specifically for the case of union to support writing to same= directory from different vertices. Sub-directories should not be required = as a workaround. Regards,Rohini On Sun, Dec 25, 2016 at 10:58 AM, Stephen Sprague wrot= e: Thanks Elliot.=C2=A0 Nice christmas present.=C2=A0=C2=A0 Those settings in = that stackoverflow link look to me to be exactly what i need to set for MR = jobs to pick that data up that Tez created.=C2=A0=20 Cheers, Stephen. On Sun, Dec 25, 2016 at 2:45 AM, Elliot West wrote: I believe that tez will generate subfolders for unioned data. As far as I k= now, this is the expected behaviour and there is no alternative. Presumably= this is to prevent multiple tasks from attempting to write the same file? We've experienced issues when switching from mr to tez; downstream jobs wer= en't expecting subfolders and had trouble reading previously accessible dat= asets. Apparently there are workarounds within Hive:http://stackoverflow.com/quest= ions/39511585/hive-create-table-not-insert-data Merry Christmas, Elliot. On Sun, 25 Dec 2016 at 03:11, Rajesh Balamohan wrot= e: Are there any exceptions in hive.log?. Is tmp_pv_v4* table part of the sele= ct query?=C2=A0 Assuming you are creating the table in staging.db, it would have created th= e table location as staging.db/foo (as you have not specified the location)= .=C2=A0 Adding user@hive.apache.org as this is hive related. ~Rajesh.B On Sun, Dec 25, 2016 at 12:08 AM, Stephen Sprague wrot= e: all, i'm running tez with the sql pattern:=20 =C2=A0=C2=A0=C2=A0 * create table foo as select * from (select... UNION sel= ect... UNION select...) in the logs the final step is this: =C2=A0=C2=A0=C2=A0 * Moving data to directory hdfs://dwrnn1.sv2.trulia.com:= 8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4 from hdfs://dwrnn1.sv= 2.trulia.com:8020/user/hive/warehouse/staging.db/.hive-staging_hive_2016-12= -24_10-05-40_048_4896412314807355668-899/-ext-10002 when querying the table i got zero rows returned which made me curious. so = i queried the hdfs location and see this: =C2=A0 $ hdfs dfs -ls hdfs://dwrnn1.sv2.trulia.com:8020/user/hive/warehouse= /staging.db/tmp_pv_v4c__loc_4 =C2=A0 Found 3 items =C2=A0 drwxrwxrwx=C2=A0=C2=A0 - dwr supergroup=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0 2016-12-24 10:05 hdfs://dwrnn1.sv2.trulia.com= :8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/1 =C2=A0 drwxrwxrwx=C2=A0=C2=A0 - dwr supergroup=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0 2016-12-24 10:06 hdfs://dwrnn1.sv2.trulia.com= :8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/2 =C2=A0 drwxrwxrwx=C2=A0=C2=A0 - dwr supergroup=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0 2016-12-24 10:06 hdfs://dwrnn1.sv2.trulia.com= :8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/3 and yes the data files are under these three dirs. so i ask... i'm not used to seeing sub-directories under the tablename unle= ss the table is partitioned. is this legit? might there be some config sett= ings i need to set to see this data via sql?=20 thanks, Stephen. =20 ------=_Part_866727_1491894028.1484119290737 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Elliot,

Mithun already create= d the following ticket to track the issue:


chris


<= font size=3D"2" face=3D"Arial"> On Tuesday, January 10, 2017 11:05 PM, Elli= ot West <teabot@gmail.com> wrote:


Thanks Rohini,

This is good to know. Could you perh= aps raise an issue in the Hive JIRA?

Thanks,

Elliot.

On Tue, 10 Jan 2017 = at 22:55, Rohini Palaniswamy <rohini.aditya@gmail.com> wrote:
The implementation in hive does look wrong. The concept of V= ertexGroups was added in Tez specifically for the case of union to support = writing to same directory from different vertices. Sub-directories should n= ot be required as a workaround.

Regards,
Rohini


On Sun, Dec 25, 2016 at 10:58 AM, Stephen Sprague <spragues@gmail.com> wrote:
Thanks Elliot.  Nice christmas present.&n= bsp;  Those settings in that stackoverflow link look to me to be exact= ly what i need to set for MR jobs to pick that data up that Tez created.&nb= sp;

Cheers,
Stephen.

On Sun, Dec 25, 2016 at 2:45 AM, Elliot West <teabot@gmail.com> wrote:
I believ= e that tez will generate subfolders for unioned data. As far as I know, thi= s is the expected behaviour and there is no alternative. Presumably this is= to prevent multiple tasks from attempting to write the same file?

We've experienced i= ssues when switching from mr to tez; downstream jobs weren't expecting subf= olders and had trouble reading previously accessible datasets.

Apparently there are = workarounds within Hive:

Merry Chri= stmas,

Elli= ot.

<= /div>
On Sun= , 25 Dec 2016 at 03:11, Rajesh Balamohan <rbalamohan@apac= he.org> wrote:
Are there any exceptions in hive.log?. Is tmp_pv_v4= * table part of the select query? 

Assumin= g you are creating the table in staging.db, it would have created the table= location as staging.db/foo (as you have not specified the location). =

Adding user@hive.apache.org as this is hive related.


=
~Rajesh.B

On Su= n, Dec 25, 2016 at 12:08 AM, Stephen Sprague <spragues@gmail.com><= /span> wrote:
all,
i'm running tez with t= he sql pattern:

&nbs= p;   * create table foo as select * from (select... UNION select.= .. UNION select...)

= in the logs the final step is this:

    * Moving data to directory hdfs://dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/stagin= g.db/tmp_pv_v4c__loc_4 from hdfs://dwrnn1.sv2.trulia.com:8= 020/user/hive/warehouse/staging.db/.hive-staging_hive_2016-12-24_10-05-40_0= 48_4896412314807355668-899/-ext-10002


when querying the table i got zero rows returned = which made me curious. so i queried the hdfs location and see this:

  $ hdfs dfs -ls hdfs:/= /dwrnn1.sv2.trulia.com:8020/user/hive/warehouse= /staging.db/tmp_pv_v4c__loc_4

  Found 3 items
  drwxrwxrwx   - dwr supergroup =          0 2016-12-24 10:05 hdfs://= dwrnn1.sv2.trulia.com:8020/user/hive/warehous= e/staging.db/tmp_pv_v4c__loc_4/1
  drwxrwxrwx   - dwr supergroup&nb= sp;         0 2016-12-24 10:06 hdfs= ://dwrnn1.sv2.trulia.com:8020/user/hive/wareh= ouse/staging.db/tmp_pv_v4c__loc_4/2
  drwxrwxrwx   - dwr supergroup=           0 2016-12-24 10:06 h= dfs://dwrnn1.sv2.trulia.com:8020/user/hive/wa= rehouse/staging.db/tmp_pv_v4c__loc_4/3

and yes the data files are under these= three dirs.

so i ask... i'm not used to seeing sub-directories un= der the tablename unless the table is partitioned. is this legit? might the= re be some config settings i need to set to see this data via sql?

thanks,
S= tephen.














------=_Part_866727_1491894028.1484119290737--