kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Martynov <mr.xk...@gmail.com>
Subject Number of buckets
Date Mon, 19 Jun 2017 13:12:19 GMT
Hi!

I can't find any generic recommendations to choose a number of buckets in
single-level hash partitioning.

All that I found:
* "For large tables, prefer to use roughly 10 partitions per server in the
cluster".
https://impala.incubator.apache.org/docs/build/html/topics/impala_kudu.html#kudu_partitioning__kudu_hash_partitioning.
BTW, why 10? Looks like magic number for me :).
* Some recommendations:
https://kudu.apache.org/docs/known_issues.html#_scale

My use case: accumulate up to 500GB-1TB of day data and run some
aggregation with Spark on that data at day end.

On what values should buckets number depend on? A number of servers,
a number of disks (I use HDDs without any RAID), a number of CPU cores?

Any suggestions?

-- 
with best regards, Pavel Martynov

Mime
View raw message