kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Dictionary encoding
Date Mon, 06 Aug 2018 18:25:45 GMT
Hi Saeid,

It's not based on the number of distinct values, but rather on the combined
size of the values. I believe the default is 256kb, so assuming your
strings are pretty short, a few thousand are likely to be able to be
dict-encoded. Note that dictionaries are calculated per-rowset (small chunk
of data) so even if your overall cardinality is much larger, if you have
some spatial locality such that rows with nearby primary keys have fewer
distinct values, then you're likely to get benefit here.


On Sat, Aug 4, 2018 at 8:10 AM, Saeid Sattari <saeid.sattari@gmail.com>

> Hi Kudu community,
> Does any body know what is the maximum distinct values of a String column
> that Kudu considers in order to set its encoding to Dictionary? Many thanks
> :)
> br,

Todd Lipcon
Software Engineer, Cloudera

View raw message