hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <pjayachand...@hortonworks.com>
Subject Re: Skewed Tables
Date Mon, 28 Apr 2014 20:59:44 GMT
Lefty, I have updated the hive wiki in few places to say we should use "stored as directories"
for list bucketing features. There are two different optimizations that uses "SKEWED BY”
keyword. One is skewed join optimization and other is list bucketing optimization. I think
we need to mention this in some place so that users are aware of the difference between the
two. “STORED AS DIRECTORIES” is used by only one optimization i.e list bucketing.

Following are the design docs for both
https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization
https://cwiki.apache.org/confluence/display/Hive/ListBucketing

Thanks
Prasanth Jayachandran

On Apr 27, 2014, at 11:28 PM, Lefty Leverenz <leftyleverenz@gmail.com> wrote:

> Prasanth, Hive's user docs are wiki-only at this point so there's no version control.
 We just add notes about which release introduced or changed something.  For an example see
the beginning of the Skewed Tables section.  Sometimes the version information isn't called
out like that, though, it's just part of the text.  And in the CREATE TABLE syntax it's a
comment alongside a clause such as TBLPROPERTIES.
> 
> The procedure for getting wiki access is described in About This Wiki:
> How to get permission to edit
> Create a Confluence account
> Sign up for the user mailing list by sending a message to user-subscribe@hive.apache.org
> Send a message to user@hive.apache.org requesting write access
> 
> Ashutosh has been granting wiki edit privileges lately (Carl Steinbach used to do it).
 I don't know how it's done or I'd gladly give you access.
> 
> I hope you'll be able to take care of this doc because you understand skewed tables and
I only know what I've read in the wiki, so I think you'll do a better job.  But of course
I'll review it and tinker with it a bit.
> 
> 
> -- Lefty
> 
> 
> On Mon, Apr 28, 2014 at 1:40 AM, Prasanth Jayachandran <pjayachandran@hortonworks.com>
wrote:
> @Mayur.. I don’t think the initial design considered CTAS for skewed tables. So it
might not be supported at all.
> 
> @Lefty.. I am not sure where/how the docs are maintained. Is it version controlled? Or
is it only maintained in confluence wiki? If it is the later can you please provide me access
to edit the wiki? or alternatively if you can update the docs adding “stored as directories”
to the examples, it will be great. Also updating the docs with “CTAS not supported for list
bucketing”.
> 
> Thanks
> Prasanth Jayachandran
> 
> On Apr 26, 2014, at 8:03 AM, Mayur Gupta <mayur.gupta81@gmail.com> wrote:
> 
>> Hey Prasanth,
>> 
>> The CTAS for skewed table doesn't work, is it a bug?
>> 
>> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as directories
select r1, r2 from t2;
>> 
>> 
>> On Thu, Apr 24, 2014 at 3:03 PM, Mayur Gupta <mayur.gupta81@gmail.com> wrote:
>> Thanks a lot Prasanth for the reply. I would have never figured that out as the documentation
at Hive Wiki DDL page and design page doesn't list this. 
>> 
>> One additional point it seems the Skewed table doesn't work when the table is created
as CTAS. The below statement doesn't create separate files. Is it a bug or is it by intent?
>> 
>> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as directories
select r1, r2 from t2;
>> 
>> 
>> On Thu, Apr 24, 2014 at 6:12 AM, Prasanth Jayachandran <pjayachandran@hortonworks.com>
wrote:
>> Hi Mayur,
>> 
>> The reason why you see single file is, you have not enabled storing skewed columns/values
as directories.
>> You can do the following to enable storing the skewed columns and values as directories
>> 
>> set hive.mapred.supports.subdirectories=true;
>> set mapred.input.dir.recursive=true;
>> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as directories;
>> 
>> This will enable you to store the skewed columns as directories below
>> 
>> /user/hive/warehouse/t1/r2=a/000000_0 (skewed values go here)
>> /user/hive/warehouse/t1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/000000_0 (all other
values go here)
>> 
>> With respect to your desc extended question where skewedColValueLocationMaps is empty,
its a bug in implementation. I just verified that it shows empty for unpartitioned tables.
But it shows correctly for partitioned tables.
>> I have created a bug for unpartitioned tables here which you can track for progress
on this issue https://issues.apache.org/jira/browse/HIVE-6968
>> 
>> 
>> Thanks
>> Prasanth Jayachandran
>> 
>> On Apr 23, 2014, at 6:52 AM, Mayur Gupta <mayur.gupta81@gmail.com> wrote:
>> 
>>> Below is my skewedInfo
>>> 
>>> skewedInfo:SkewedInfo(skewedColNames:[r2], skewedColValues:[[a]], skewedColValueLocationMaps:{})
>>> 
>>> Any idea why is the skewedColValueLocationMaps empty? 
>>> 
>>> 
>>> On Mon, Apr 21, 2014 at 11:19 AM, Mayur Gupta <mayur.gupta81@gmail.com>
wrote:
>>> Hey There,
>>> 
>>> I was trying to use Skewed tables but I am facing the issue that it is not creating
separate files for the skewed data. Even with a simple example I am having the same issue.
The hive version is 0.11.
>>> 
>>> create table t(col1 string, col2 string);
>>> load  data local inpath '/home/hadoop/a.txt' into table t; 
>>> 
>>> create table t1(r1 string, r2 string) skewed by (r2) on ('a');
>>> insert into table t1 select * from t;
>>> 
>>> The contents of a.txt are :
>>> 1 ^Aa
>>> 2^A b
>>> 3 ^Ac
>>> 4 ^Aa
>>> 5 ^Ab
>>> 6 ^Aa
>>> 
>>> I see only single file.
>>> 
>>> /user/hive/warehouse/t1/000000_0
>>> 
>>> Any pointers on what I am doing wrong?
>>> 
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which
it is addressed and may contain information that is confidential, privileged and exempt from
disclosure under applicable law. If the reader of this message is not the intended recipient,
you are hereby notified that any printing, copying, dissemination, distribution, disclosure
or forwarding of this communication is strictly prohibited. If you have received this communication
in error, please contact the sender immediately and delete it from your system. Thank You.
>> 
>> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it
is addressed and may contain information that is confidential, privileged and exempt from
disclosure under applicable law. If the reader of this message is not the intended recipient,
you are hereby notified that any printing, copying, dissemination, distribution, disclosure
or forwarding of this communication is strictly prohibited. If you have received this communication
in error, please contact the sender immediately and delete it from your system. Thank You.
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message