hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lefty Leverenz <leftylever...@gmail.com>
Subject Re: Skewed vs ListBucketing
Date Wed, 02 Jul 2014 05:25:49 GMT
Does anyone have time to answer this?  It would be good to clarify things
in the wiki.

HIVE-3649 <https://issues.apache.org/jira/browse/HIVE-3649> added the list
bucketing feature in release 0.10.0.  The description says:

We need to differ normal skewed table from list bucketing table. we use an
> optional parameter "store as DIRECTORIES"


So I think your understanding is correct, but let's hear from the experts.

-- Lefty


On Fri, Jun 27, 2014 at 1:25 PM, Steven Willis <swillis@compete.com> wrote:

> I'm having trouble understanding the difference between a skewed table and
> a list bucketed table:
>
> https://cwiki.apache.org/confluence/display/Hive/ListBucketing
>
> Is the only difference that ListBucketing stores the data as directories
> and a "plain" skewed table stores them as files? I think that's what the
> wiki page is saying, but it's very confusing. For one, the title of the
> page is ListBucketing and in many places it seems to use the phrase "List
> Bucketing" as the general feature of partitioning a table by skewed columns
> (whether in directories or files).
>
> There's a section "Skewed Table vs. List Bucketing Table" (
> https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing)
that
> I would assume would spell out the differences between the two, but it says:
>
>  - Skewed Table is a table which has skewed information.
>  - List Bucketing Table is a skewed table. In addition, it tells Hive to
> use the list bucketing feature on the skewed table: create sub-directories
> for skewed values.
>
> That makes it seem like "the list bucketing feature" is just using
> sub-directories for the data. If that's the case, why is the whole article
> titled ListBucketing, and why is the section describing the basic idea
> (that apparently both skewed tables and list bucketed tables have in
> common) titled just "List Bucketing" (
> https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing
> ).
>
> The article also says, "Mainly due to its sub-directory nature, list
> bucketing can't coexist with some features." So does that mean just list
> bucketing (the subdirectory feature that skewed tables can have as an
> option) is incompatible with the features mentioned, or does it mean that
> any skewed table is incompatible with said features.
>
> -Steve
>

Mime
View raw message