hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Shelukhin <>
Subject Re: does anyone care about list bucketing stored as directories?
Date Sat, 07 Oct 2017 00:05:52 GMT
Looks like nobody does… I’ll file a ticket to remove it shortly.

From: Sergey Shelukhin <<>>
Date: Tuesday, October 3, 2017 at 12:59
To: "<>" <<>>,
"<>" <<>>
Subject: does anyone care about list bucketing stored as directories?

1) There seem to be some bugs and limitations in LB (e.g. incorrect cleanup -
and nobody appears to as much as watch JIRAs ;) Does anyone actually use this stuff? Should
we nuke it in 3.0, and by 3.0 I mean I’ll remove it from master in a few weeks? :)

2) I actually wonder, on top of the same SQL syntax, wouldn’t it be much easier to add logic
to partitioning to write skew values into partitions and non-skew values into a new type of
default partition? It won’t affect nearly as many low level codepaths in obscure and unobvious
ways, instead keeping all the logic in metastore and split generation, and would integrate
with Hive features like PPD automatically.
Esp. if we are ok with the same limitations - e.g. if you add a new skew value right now,
I’m not sure what happens to the rows with that value already sitting in the non-skew directories,
but I don’t expect anything reasonable...

View raw message