hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <leftylever...@gmail.com>
Subject Re: Skewed Tables
Date Mon, 28 Apr 2014 06:28:05 GMT
Prasanth, Hive's user docs are wiki-only at this point so there's no
version control.  We just add notes about which release introduced or
changed something.  For an example see the beginning of the Skewed
Tables<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-SkewedTables>section.
 Sometimes the version information isn't called out like that,
though, it's just part of the text.  And in the CREATE TABLE
syntax<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable>it's
a comment alongside a clause such as TBLPROPERTIES.

The procedure for getting wiki access is described in About This
Wiki<https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki>
:

> How to get permission to edit
>
>    - Create a Confluence account
>    - Sign up for the user mailing list by sending a message to
>    user-subscribe@hive.apache.org
>    - Send a message to user@hive.apache.org requesting write access
>
>
Ashutosh has been granting wiki edit privileges lately (Carl Steinbach used
to do it).  I don't know how it's done or I'd gladly give you access.

I hope you'll be able to take care of this doc because you understand
skewed tables and I only know what I've read in the wiki, so I think you'll
do a better job.  But of course I'll review it and tinker with it a bit.


-- Lefty


On Mon, Apr 28, 2014 at 1:40 AM, Prasanth Jayachandran <
pjayachandran@hortonworks.com> wrote:

> @Mayur.. I don’t think the initial design considered CTAS for skewed
> tables. So it might not be supported at all.
>
> @Lefty.. I am not sure where/how the docs are maintained. Is it version
> controlled? Or is it only maintained in confluence wiki? If it is the later
> can you please provide me access to edit the wiki? or alternatively if you
> can update the docs adding “stored as directories” to the examples, it will
> be great. Also updating the docs with “CTAS not supported for list
> bucketing”.
>
> Thanks
> Prasanth Jayachandran
>
> On Apr 26, 2014, at 8:03 AM, Mayur Gupta <mayur.gupta81@gmail.com> wrote:
>
> Hey Prasanth,
>
> The CTAS for skewed table doesn't work, is it a bug?
>
> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as
> directories select r1, r2 from t2;
>
>
> On Thu, Apr 24, 2014 at 3:03 PM, Mayur Gupta <mayur.gupta81@gmail.com>wrote:
>
>> Thanks a lot Prasanth for the reply. I would have never figured that out
>> as the documentation at Hive Wiki DDL page<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-SkewedTables>and
design
>> page <https://cwiki.apache.org/confluence/display/Hive/ListBucketing> doesn't
>> list this.
>>
>> One additional point it seems the Skewed table doesn't work when the
>> table is created as CTAS. The below statement doesn't create separate
>> files. Is it a bug or is it by intent?
>>
>> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as
>> directories select r1, r2 from t2;
>>
>>
>> On Thu, Apr 24, 2014 at 6:12 AM, Prasanth Jayachandran <
>> pjayachandran@hortonworks.com> wrote:
>>
>>> Hi Mayur,
>>>
>>> The reason why you see single file is, you have not enabled storing
>>> skewed columns/values as directories.
>>> You can do the following to enable storing the skewed columns and values
>>> as directories
>>>
>>> set hive.mapred.supports.subdirectories=true;
>>> set mapred.input.dir.recursive=true;
>>> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as
>>> directories;
>>>
>>> This will enable you to store the skewed columns as directories below
>>>
>>> /user/hive/warehouse/t1/r2=a/000000_0 (skewed values go here)
>>> /user/hive/warehouse/t1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/000000_0
>>> (all other values go here)
>>>
>>> With respect to your desc extended question where
>>> skewedColValueLocationMaps is empty, its a bug in implementation. I just
>>> verified that it shows empty for unpartitioned tables. But it shows
>>> correctly for partitioned tables.
>>> I have created a bug for unpartitioned tables here which you can track
>>> for progress on this issue
>>> https://issues.apache.org/jira/browse/HIVE-6968
>>>
>>>
>>> Thanks
>>> Prasanth Jayachandran
>>>
>>> On Apr 23, 2014, at 6:52 AM, Mayur Gupta <mayur.gupta81@gmail.com>
>>> wrote:
>>>
>>> Below is my skewedInfo
>>>
>>> skewedInfo:SkewedInfo(skewedColNames:[r2], skewedColValues:[[a]],
>>> skewedColValueLocationMaps:{})
>>>
>>> Any idea why is the skewedColValueLocationMaps empty?
>>>
>>>
>>> On Mon, Apr 21, 2014 at 11:19 AM, Mayur Gupta <mayur.gupta81@gmail.com>wrote:
>>>
>>>> Hey There,
>>>>
>>>> I was trying to use Skewed tables but I am facing the issue that it is
>>>> not creating separate files for the skewed data. Even with a simple example
>>>> I am having the same issue. The hive version is 0.11.
>>>>
>>>> create table t(col1 string, col2 string);
>>>> load  data local inpath '/home/hadoop/a.txt' into table t;
>>>>
>>>> create table t1(r1 string, r2 string) skewed by (r2) on ('a');
>>>> insert into table t1 select * from t;
>>>>
>>>> The contents of a.txt are :
>>>> 1 ^Aa
>>>> 2^A b
>>>> 3 ^Ac
>>>> 4 ^Aa
>>>> 5 ^Ab
>>>> 6 ^Aa
>>>>
>>>> I see only single file.
>>>>
>>>> /user/hive/warehouse/t1/000000_0
>>>>
>>>> Any pointers on what I am doing wrong?
>>>>
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Mime
View raw message