hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Willis <>
Subject Skewed vs ListBucketing
Date Fri, 27 Jun 2014 17:25:48 GMT
I'm having trouble understanding the difference between a skewed table and a list bucketed

Is the only difference that ListBucketing stores the data as directories and a "plain" skewed
table stores them as files? I think that's what the wiki page is saying, but it's very confusing.
For one, the title of the page is ListBucketing and in many places it seems to use the phrase
"List Bucketing" as the general feature of partitioning a table by skewed columns (whether
in directories or files).

There's a section "Skewed Table vs. List Bucketing Table" (
that I would assume would spell out the differences between the two, but it says:

 - Skewed Table is a table which has skewed information.
 - List Bucketing Table is a skewed table. In addition, it tells Hive to use the list bucketing
feature on the skewed table: create sub-directories for skewed values.

That makes it seem like "the list bucketing feature" is just using sub-directories for the
data. If that's the case, why is the whole article titled ListBucketing, and why is the section
describing the basic idea (that apparently both skewed tables and list bucketed tables have
in common) titled just "List Bucketing" (

The article also says, "Mainly due to its sub-directory nature, list bucketing can't coexist
with some features." So does that mean just list bucketing (the subdirectory feature that
skewed tables can have as an option) is incompatible with the features mentioned, or does
it mean that any skewed table is incompatible with said features.


View raw message