kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Martynov <mr.xk...@gmail.com>
Subject Number of buckets
Date Mon, 19 Jun 2017 13:12:19 GMT

I can't find any generic recommendations to choose a number of buckets in
single-level hash partitioning.

All that I found:
* "For large tables, prefer to use roughly 10 partitions per server in the
BTW, why 10? Looks like magic number for me :).
* Some recommendations:

My use case: accumulate up to 500GB-1TB of day data and run some
aggregation with Spark on that data at day end.

On what values should buckets number depend on? A number of servers,
a number of disks (I use HDDs without any RAID), a number of CPU cores?

Any suggestions?

with best regards, Pavel Martynov

View raw message