kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Column Compression and Encoding
Date Tue, 08 May 2018 16:09:55 GMT
Hi Saeid,

We've tried to make the default compression/encoding a reasonable tradeoff
of performance for most common workloads. A couple quick tips I've found
from my experiments:

- high-cardinality strings won't be automatically compressed by
dictionaries. So, if you have such a large string that might have repeated
substrings (eg a set of URLs) then enabling LZ4 compression is a good idea.
- if you have strings with a lot of common prefixes, you might consider
PREFIX_ENCODING
- for integer types, choose the smallest size that fits your intended
range. eg don't use int64 for storing a customer's age. On disk it will
compress to about the same size, but in memory it will use a lot more space
with the larger type.

Perhaps others can jump in with further recommendations based on experience.

-Todd

On Mon, May 7, 2018 at 1:45 AM, Saeid Sattari <saeid.sattari@gmail.com>
wrote:

> Hi all,
>
> Folks who have used the column compression and encoding in Kudu tables:
> can you share your experiences with the performance?  What type of fields
> are worse/better (IO bottleneck vs query return time,..) to compress. We
> can collect a knowledge base regarding these subjects that users can use in
> the future. Thanks.
>
> Regards,
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message