kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saeid Sattari <saeid.satt...@gmail.com>
Subject Re: Dictionary encoding
Date Mon, 06 Aug 2018 19:56:42 GMT
Hi Todd,

Thank you for good descriptions :)

Regards,
Saeid

On Mon, 6 Aug 2018, 21:26 Todd Lipcon, <todd@cloudera.com> wrote:

> Hi Saeid,
>


> It's not based on the number of distinct values, but rather on the
> combined size of the values. I believe the default is 256kb, so assuming
> your strings are pretty short, a few thousand are likely to be able to be
> dict-encoded. Note that dictionaries are calculated per-rowset (small chunk
> of data) so even if your overall cardinality is much larger, if you have
> some spatial locality such that rows with nearby primary keys have fewer
> distinct values, then you're likely to get benefit here.
>
> -Todd
>
> On Sat, Aug 4, 2018 at 8:10 AM, Saeid Sattari <saeid.sattari@gmail.com>
> wrote:
>
>> Hi Kudu community,
>>
>> Does any body know what is the maximum distinct values of a String column
>> that Kudu considers in order to set its encoding to Dictionary? Many thanks
>> :)
>>
>> br,
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message