hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <leftylever...@gmail.com>
Subject Re: Skewed Tables
Date Sat, 26 Apr 2014 09:10:22 GMT
I can point to possible locations but I'm not sure where this belongs.  For
starters, STORED AS DIRECTORIES needs to be added to the storage format
section<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe>in
the DDL doc and several config params need to be added to the
Configuration
Properties<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties>doc.
 (I'll take care of the config params.)

As Mayur pointed out, we have the DDL doc and a design doc.  There's
another design doc too, so take your pick among these locations:

   - DDL doc
      - Create Table -- Row Format, Storage Format, and
SerDe<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe>
      - Create Table -- Skewed
Tables<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-SkewedTables>
       -- *might be the best place*
      - CTAS<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)>--
if skewed table doesn't work, say so here and in Skewed Tables
      - Alter Table Storage
Properties<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTableStorageProperties>--
add STORED AS DIRECTORIES here or in  separate section for skewed
tables
      - Alter Table or
Partition<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterEitherTableorPartition>--
*can
      this be done at the partition level?*
   - List Bucketing (desgn doc)
   - Hive Enhancements:  Create
Table<https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-CreateTable>
and
      Alter Table<https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-AlterTable>
      - *could have new configuration section in or after Hive Enhancements*
   - Skewed Join
Optimization<https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization>
(design
   doc)
      - *doesn't seem to belong here*
   -
*Configuration Properties
   <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties>
   *
      - *definitely doesn't belong here, but we need the parameters *

Wherever you put it, I'll add links from some other locations.

By the way, is STORED AS DIRECTORIES used for anything other than skewed
tables?

Thanks.

-- Lefty


On Fri, Apr 25, 2014 at 6:23 PM, Prasanth Jayachandran <
pjayachandran@hortonworks.com> wrote:

> Lefty,
>
> I can add this information. Can you please point me to the location to add
> this? Perhaps, you can help reviewing it.
>
> Thanks
> Prasanth Jayachandran
>
> On Apr 24, 2014, at 1:13 PM, Lefty Leverenz <leftyleverenz@gmail.com>
> wrote:
>
> I'm looking at the docs and thinking of ways to include this information.
>  But Prasanth, if you want to do it yourself that would be great.
>
> -- Lefty
>
>
> On Thu, Apr 24, 2014 at 5:33 AM, Mayur Gupta <mayur.gupta81@gmail.com>wrote:
>
>> Thanks a lot Prasanth for the reply. I would have never figured that out
>> as the documentation at Hive Wiki DDL page<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-SkewedTables>and
design
>> page <https://cwiki.apache.org/confluence/display/Hive/ListBucketing> doesn't
>> list this.
>>
>> One additional point it seems the Skewed table doesn't work when the
>> table is created as CTAS. The below statement doesn't create separate
>> files. Is it a bug or is it by intent?
>>
>> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as
>> directories select r1, r2 from t2;
>>
>>
>> On Thu, Apr 24, 2014 at 6:12 AM, Prasanth Jayachandran <
>> pjayachandran@hortonworks.com> wrote:
>>
>>> Hi Mayur,
>>>
>>> The reason why you see single file is, you have not enabled storing
>>> skewed columns/values as directories.
>>> You can do the following to enable storing the skewed columns and values
>>> as directories
>>>
>>> set hive.mapred.supports.subdirectories=true;
>>> set mapred.input.dir.recursive=true;
>>> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as
>>> directories;
>>>
>>> This will enable you to store the skewed columns as directories below
>>>
>>> /user/hive/warehouse/t1/r2=a/000000_0 (skewed values go here)
>>> /user/hive/warehouse/t1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/000000_0
>>> (all other values go here)
>>>
>>> With respect to your desc extended question where
>>> skewedColValueLocationMaps is empty, its a bug in implementation. I just
>>> verified that it shows empty for unpartitioned tables. But it shows
>>> correctly for partitioned tables.
>>> I have created a bug for unpartitioned tables here which you can track
>>> for progress on this issue
>>> https://issues.apache.org/jira/browse/HIVE-6968
>>>
>>>
>>> Thanks
>>> Prasanth Jayachandran
>>>
>>> On Apr 23, 2014, at 6:52 AM, Mayur Gupta <mayur.gupta81@gmail.com>
>>> wrote:
>>>
>>> Below is my skewedInfo
>>>
>>> skewedInfo:SkewedInfo(skewedColNames:[r2], skewedColValues:[[a]],
>>> skewedColValueLocationMaps:{})
>>>
>>> Any idea why is the skewedColValueLocationMaps empty?
>>>
>>>
>>> On Mon, Apr 21, 2014 at 11:19 AM, Mayur Gupta <mayur.gupta81@gmail.com>wrote:
>>>
>>>> Hey There,
>>>>
>>>> I was trying to use Skewed tables but I am facing the issue that it is
>>>> not creating separate files for the skewed data. Even with a simple example
>>>> I am having the same issue. The hive version is 0.11.
>>>>
>>>> create table t(col1 string, col2 string);
>>>> load  data local inpath '/home/hadoop/a.txt' into table t;
>>>>
>>>> create table t1(r1 string, r2 string) skewed by (r2) on ('a');
>>>> insert into table t1 select * from t;
>>>>
>>>> The contents of a.txt are :
>>>> 1 ^Aa
>>>> 2^A b
>>>> 3 ^Ac
>>>> 4 ^Aa
>>>> 5 ^Ab
>>>> 6 ^Aa
>>>>
>>>> I see only single file.
>>>>
>>>> /user/hive/warehouse/t1/000000_0
>>>>
>>>> Any pointers on what I am doing wrong?
>>>>
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Mime
View raw message