hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Gupta <mayur.gupt...@gmail.com>
Subject Re: Skewed Tables
Date Sat, 26 Apr 2014 15:03:29 GMT
Hey Prasanth,

The CTAS for skewed table doesn't work, is it a bug?

create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as
directories select r1, r2 from t2;


On Thu, Apr 24, 2014 at 3:03 PM, Mayur Gupta <mayur.gupta81@gmail.com>wrote:

> Thanks a lot Prasanth for the reply. I would have never figured that out
> as the documentation at Hive Wiki DDL page<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-SkewedTables>and
design
> page <https://cwiki.apache.org/confluence/display/Hive/ListBucketing> doesn't
> list this.
>
> One additional point it seems the Skewed table doesn't work when the table
> is created as CTAS. The below statement doesn't create separate files. Is
> it a bug or is it by intent?
>
> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as
> directories select r1, r2 from t2;
>
>
> On Thu, Apr 24, 2014 at 6:12 AM, Prasanth Jayachandran <
> pjayachandran@hortonworks.com> wrote:
>
>> Hi Mayur,
>>
>> The reason why you see single file is, you have not enabled storing
>> skewed columns/values as directories.
>> You can do the following to enable storing the skewed columns and values
>> as directories
>>
>> set hive.mapred.supports.subdirectories=true;
>> set mapred.input.dir.recursive=true;
>> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as
>> directories;
>>
>> This will enable you to store the skewed columns as directories below
>>
>> /user/hive/warehouse/t1/r2=a/000000_0 (skewed values go here)
>> /user/hive/warehouse/t1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/000000_0
>> (all other values go here)
>>
>> With respect to your desc extended question where
>> skewedColValueLocationMaps is empty, its a bug in implementation. I just
>> verified that it shows empty for unpartitioned tables. But it shows
>> correctly for partitioned tables.
>> I have created a bug for unpartitioned tables here which you can track
>> for progress on this issue
>> https://issues.apache.org/jira/browse/HIVE-6968
>>
>>
>> Thanks
>> Prasanth Jayachandran
>>
>> On Apr 23, 2014, at 6:52 AM, Mayur Gupta <mayur.gupta81@gmail.com> wrote:
>>
>> Below is my skewedInfo
>>
>> skewedInfo:SkewedInfo(skewedColNames:[r2], skewedColValues:[[a]],
>> skewedColValueLocationMaps:{})
>>
>> Any idea why is the skewedColValueLocationMaps empty?
>>
>>
>> On Mon, Apr 21, 2014 at 11:19 AM, Mayur Gupta <mayur.gupta81@gmail.com>wrote:
>>
>>> Hey There,
>>>
>>> I was trying to use Skewed tables but I am facing the issue that it is
>>> not creating separate files for the skewed data. Even with a simple example
>>> I am having the same issue. The hive version is 0.11.
>>>
>>> create table t(col1 string, col2 string);
>>> load  data local inpath '/home/hadoop/a.txt' into table t;
>>>
>>> create table t1(r1 string, r2 string) skewed by (r2) on ('a');
>>> insert into table t1 select * from t;
>>>
>>> The contents of a.txt are :
>>> 1 ^Aa
>>> 2^A b
>>> 3 ^Ac
>>> 4 ^Aa
>>> 5 ^Ab
>>> 6 ^Aa
>>>
>>> I see only single file.
>>>
>>> /user/hive/warehouse/t1/000000_0
>>>
>>> Any pointers on what I am doing wrong?
>>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Mime
View raw message