Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E107111545 for ; Mon, 28 Apr 2014 06:28:39 +0000 (UTC) Received: (qmail 12773 invoked by uid 500); 28 Apr 2014 06:28:37 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 12340 invoked by uid 500); 28 Apr 2014 06:28:36 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 12328 invoked by uid 99); 28 Apr 2014 06:28:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Apr 2014 06:28:35 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of leftyleverenz@gmail.com designates 209.85.223.170 as permitted sender) Received: from [209.85.223.170] (HELO mail-ie0-f170.google.com) (209.85.223.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Apr 2014 06:28:30 +0000 Received: by mail-ie0-f170.google.com with SMTP id rd18so6244646iec.29 for ; Sun, 27 Apr 2014 23:28:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=nsKarksfWwILpDPFltfml35eRpfABISpVyd2FA9Ez3E=; b=CPvIpZDWwicfRhhG/apRo5MrmglhdCacPT8HDsqcw/EAmpS9LZpmwdhS++dP22GjFB t2LGa3W8ezxsMydDB0ARTKrth4QwPcPj7pmPvH4h+qGCXO/SX2r3k2gdDODFYxdj23CB IMWClNyKW1DX1+Hs43tSxMqFsQUlitputYH+nN/Nx2pEZyd3KoFTLb3VR7s7VnDxQUd5 jbLBVNuIA6Vfx7BLIdsD+x07YuyVuA1hwGD+5DVObg74uXiEfOF+IKXc1E29QxmJLQAE aPvitg/FvOIrmG6EaS5kVjtlQPmTHjFFurxH/CyC+qksqBVFeVwMN8mw/zb5/hFFB1dv 1zWA== MIME-Version: 1.0 X-Received: by 10.50.36.66 with SMTP id o2mr21517807igj.24.1398666485923; Sun, 27 Apr 2014 23:28:05 -0700 (PDT) Received: by 10.43.135.193 with HTTP; Sun, 27 Apr 2014 23:28:05 -0700 (PDT) In-Reply-To: <4E01431A-61D8-4799-AC7F-DCF67C5D9CF9@hortonworks.com> References: <4E01431A-61D8-4799-AC7F-DCF67C5D9CF9@hortonworks.com> Date: Mon, 28 Apr 2014 02:28:05 -0400 Message-ID: Subject: Re: Skewed Tables From: Lefty Leverenz To: user@hive.apache.org Content-Type: multipart/alternative; boundary=089e01182e3e0d846304f8146b55 X-Virus-Checked: Checked by ClamAV on apache.org --089e01182e3e0d846304f8146b55 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Prasanth, Hive's user docs are wiki-only at this point so there's no version control. We just add notes about which release introduced or changed something. For an example see the beginning of the Skewed Tablessection. Sometimes the version information isn't called out like that, though, it's just part of the text. And in the CREATE TABLE syntaxit's a comment alongside a clause such as TBLPROPERTIES. The procedure for getting wiki access is described in About This Wiki : > How to get permission to edit > > - Create a Confluence account > - Sign up for the user mailing list by sending a message to > user-subscribe@hive.apache.org > - Send a message to user@hive.apache.org requesting write access > > Ashutosh has been granting wiki edit privileges lately (Carl Steinbach used to do it). I don't know how it's done or I'd gladly give you access. I hope you'll be able to take care of this doc because you understand skewed tables and I only know what I've read in the wiki, so I think you'll do a better job. But of course I'll review it and tinker with it a bit. -- Lefty On Mon, Apr 28, 2014 at 1:40 AM, Prasanth Jayachandran < pjayachandran@hortonworks.com> wrote: > @Mayur.. I don=E2=80=99t think the initial design considered CTAS for ske= wed > tables. So it might not be supported at all. > > @Lefty.. I am not sure where/how the docs are maintained. Is it version > controlled? Or is it only maintained in confluence wiki? If it is the lat= er > can you please provide me access to edit the wiki? or alternatively if yo= u > can update the docs adding =E2=80=9Cstored as directories=E2=80=9D to the= examples, it will > be great. Also updating the docs with =E2=80=9CCTAS not supported for lis= t > bucketing=E2=80=9D. > > Thanks > Prasanth Jayachandran > > On Apr 26, 2014, at 8:03 AM, Mayur Gupta wrote: > > Hey Prasanth, > > The CTAS for skewed table doesn't work, is it a bug? > > create tablet1(r1 string, r2 string) skewed by (r2) on (=E2=80=98a=E2=80= =99) stored as > directories select r1, r2 from t2; > > > On Thu, Apr 24, 2014 at 3:03 PM, Mayur Gupta wro= te: > >> Thanks a lot Prasanth for the reply. I would have never figured that out >> as the documentation at Hive Wiki DDL pageand d= esign >> page do= esn't >> list this. >> >> One additional point it seems the Skewed table doesn't work when the >> table is created as CTAS. The below statement doesn't create separate >> files. Is it a bug or is it by intent? >> >> create tablet1(r1 string, r2 string) skewed by (r2) on (=E2=80=98a=E2=80= =99) stored as >> directories select r1, r2 from t2; >> >> >> On Thu, Apr 24, 2014 at 6:12 AM, Prasanth Jayachandran < >> pjayachandran@hortonworks.com> wrote: >> >>> Hi Mayur, >>> >>> The reason why you see single file is, you have not enabled storing >>> skewed columns/values as directories. >>> You can do the following to enable storing the skewed columns and value= s >>> as directories >>> >>> set hive.mapred.supports.subdirectories=3Dtrue; >>> set mapred.input.dir.recursive=3Dtrue; >>> create tablet1(r1 string, r2 string) skewed by (r2) on (=E2=80=98a=E2= =80=99) stored as >>> directories; >>> >>> This will enable you to store the skewed columns as directories below >>> >>> /user/hive/warehouse/t1/r2=3Da/000000_0 (skewed values go here) >>> /user/hive/warehouse/t1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/000000_0 >>> (all other values go here) >>> >>> With respect to your desc extended question where >>> skewedColValueLocationMaps is empty, its a bug in implementation. I jus= t >>> verified that it shows empty for unpartitioned tables. But it shows >>> correctly for partitioned tables. >>> I have created a bug for unpartitioned tables here which you can track >>> for progress on this issue >>> https://issues.apache.org/jira/browse/HIVE-6968 >>> >>> >>> Thanks >>> Prasanth Jayachandran >>> >>> On Apr 23, 2014, at 6:52 AM, Mayur Gupta >>> wrote: >>> >>> Below is my skewedInfo >>> >>> skewedInfo:SkewedInfo(skewedColNames:[r2], skewedColValues:[[a]], >>> skewedColValueLocationMaps:{}) >>> >>> Any idea why is the skewedColValueLocationMaps empty? >>> >>> >>> On Mon, Apr 21, 2014 at 11:19 AM, Mayur Gupta = wrote: >>> >>>> Hey There, >>>> >>>> I was trying to use Skewed tables but I am facing the issue that it is >>>> not creating separate files for the skewed data. Even with a simple ex= ample >>>> I am having the same issue. The hive version is 0.11. >>>> >>>> create table t(col1 string, col2 string); >>>> load data local inpath '/home/hadoop/a.txt' into table t; >>>> >>>> create table t1(r1 string, r2 string) skewed by (r2) on ('a'); >>>> insert into table t1 select * from t; >>>> >>>> The contents of a.txt are : >>>> 1 ^Aa >>>> 2^A b >>>> 3 ^Ac >>>> 4 ^Aa >>>> 5 ^Ab >>>> 6 ^Aa >>>> >>>> I see only single file. >>>> >>>> /user/hive/warehouse/t1/000000_0 >>>> >>>> Any pointers on what I am doing wrong? >>>> >>> >>> >>> >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entit= y >>> to which it is addressed and may contain information that is confidenti= al, >>> privileged and exempt from disclosure under applicable law. If the read= er >>> of this message is not the intended recipient, you are hereby notified = that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender immedia= tely >>> and delete it from your system. Thank You. >> >> >> > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential= , > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified th= at > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediate= ly > and delete it from your system. Thank You. > --089e01182e3e0d846304f8146b55 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Prasanth, Hive's user docs are wiki-only at this point= so there's no version control. =C2=A0We just add notes about which rel= ease introduced or changed something. =C2=A0For an example see the beginnin= g of the Skewed Tables section. =C2= =A0Sometimes the version information isn't called out like that, though= , it's just part of the text. =C2=A0And in the CREATE TABLE syntax it's a comment alongside a clause su= ch as TBLPROPERTIES.

The procedure for getting wiki access is described in About = This Wiki:

How to get permission to edit

<= /span>

Ashutosh has been granting wiki edit privileges lately = (Carl Steinbach used to do it). =C2=A0I don't know how it's done or= I'd gladly give you access.

I hope you'll= be able to take care of this doc because you understand skewed tables and = I only know what I've read in the wiki, so I think you'll do a bett= er job. =C2=A0But of course I'll review it and tinker with it a bit.


-- Lefty


On Mon, Apr 28, 2014 at 1:40 AM, Prasant= h Jayachandran <pjayachandran@hortonworks.com> w= rote:
@Mayur..= I don=E2=80=99t think the initial design considered CTAS for skewed tables= . So it might not be supported at all.

@Lefty.. I am not sure where/how the docs are maintained. Is= it version controlled? Or is it only maintained in confluence wiki? If it = is the later can you please provide me access to edit the wiki? or alternat= ively if you can update the docs adding =E2=80=9Cstored as directories=E2= =80=9D to the examples, it will be great. Also updating the docs with =E2= =80=9CCTAS not supported for list bucketing=E2=80=9D.

Thanks
Prasan= th Jayachandran

On Apr 26, 2014, at 8:03 AM, Mayur Gupta <mayur.gupta81@gmail.com>= ; wrote:

Hey Prasanth,<= div>
The CTAS for skewed table doesn't work, is it a bug?

create tablet1(r1 string, r2 string) skewed by (r= 2) on (=E2=80=98a=E2=80=99) stored as directories select r1, r2 from t2;


O= n Thu, Apr 24, 2014 at 3:03 PM, Mayur Gupta <mayur.gupta81@gmail.c= om> wrote:
Thanks a lot Prasanth for the reply. I wo= uld have never figured that out as the documentation at Hive Wiki DDL page and design page=C2=A0doesn't list this.=C2=A0

One additional point it seems the Skewed table doesn't w= ork when the table is created as CTAS. The below statement doesn't=C2= =A0create=C2=A0separate files. Is it a bug or is it by intent?
create tablet1(r1 string, r2 string) skewed by (r2) on (=E2=80=98a= =E2=80=99) stored as directories select r1, r2 from t2;


On Thu, Apr 24, 2014 at 6:12 AM, Prasanth Jayachandran <pjayachandran@hortonworks.com> wrote:
Hi Mayur= ,

The reason why you see single file is, you have not en= abled storing skewed columns/values as directories.
You can do the following to enable storing the skewed columns and valu= es as directories

set hive.mapred.supports.subdirectories=3Dt= rue;
set ma= pred.input.dir.recursive=3Dtrue;
create tablet1(r1 string, r2 string)= skewed by (r2) on (=E2=80=98a=E2=80=99) stored as directories;

This will enable you to store the skewed columns as dir= ectories below

/user/hive/warehouse/t1/r2=3Da/0000= 00_0 (skewed values go here)
/user/hive/warehouse/t1/HIVE_DEFAULT= _LIST_BUCKETING_DIR_NAME/000000_0 (all other values go here)

With respect to your desc extended question where skewe= dColValueLocationMaps is empty, its a bug in implementation. I just verifie= d that it shows empty for unpartitioned tables. But it shows correctly for = partitioned tables.
I have created a bug for unpartitioned tables here which you can track= for progress on this issue=C2=A0https://issues.apache.org/jira/browse/H= IVE-6968


Thanks
Prasanth Jayachandran

On Apr 23, 2014, at 6:52 AM, Mayur Gupta <mayur.gupta81@gmail.com>= ; wrote:

Below is my sk= ewedInfo

skewedInfo:SkewedInfo(skewedColNames:[r2], skewedColValues:[= [a]], skewedColValueLocationMaps:{})

Any idea = why is the skewedColValueLocationMaps empty?=C2=A0


On Mon,= Apr 21, 2014 at 11:19 AM, Mayur Gupta <mayur.gupta81@gmail.com&= gt; wrote:
Hey There,

I was tryin= g to use Skewed tables but I am facing the issue that it is not creating se= parate files for the skewed data. Even with a simple example I am having th= e same issue. The hive version is 0.11.

create table t(col1 string, col2 string);
load =C2=A0data local inpath '/home/hadoop/a.txt' into table t;=C2= =A0

create table t1(r1 string, r2 string) skew= ed by (r2) on ('a');
insert into table t1 select * from t;

The contents of a.txt are :
1 ^Aa
2^A b
=
3 ^Ac
4 ^Aa
5 ^Ab
6 ^Aa

I see only single file.

/user/hive/= warehouse/t1/000000_0

Any pointers on what I a= m doing wrong?



CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u.



=

CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u.

--089e01182e3e0d846304f8146b55--