kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: clarification on Partitioning Guidelines and CPU cores
Date Thu, 18 Oct 2018 02:24:32 GMT
interesting, I did not realize that. Thanks for the tip!

On Wed, Oct 17, 2018 at 9:05 PM Adar Lieber-Dembo <adar@cloudera.com> wrote:

> The 60 tablets per table per node limit is just at table creation time.
> You can create a table that maxes out the number of tablets, then add more
> range partitions afterwards.
> On Wed, Oct 17, 2018 at 6:00 PM Boris Tyukin <boris@boristyukin.com>
> wrote:
>> thanks for replying, Adar. Did some math and in our case we are hitting
>> another Kudu limit - 60 tablets per node. We use high density nodes with 2
>> 24-core CPUs so we have 88 hyperthreaded cores total per node or 88*24=2112
>> cores total. But I cannot create more than 60*24=1440 tablets per table.
>> Looks like my tablets for the largest table will be around 8-10Gb in size.
>> Should I be worried since recommendation is to keep tablets about 1Gb in
>> size?
>> On Wed, Oct 17, 2018 at 8:06 PM Adar Lieber-Dembo <adar@cloudera.com>
>> wrote:
>>> Hi Boris,
>>> > Also, when they say tablets - I assume this is before replication? so
>>> in reality, it is number of nodes x cpu cores / replication factor? If this
>>> is the case, it is not looking good...
>>> No, I think this is post-replication. The underlying assumption is
>>> that you want to maximize parallelism for large tables, and since
>>> Impala only uses one read thread per tablet, that means ensuring the
>>> number of tablets is close or equal to the overall number of cores.
>>> However, during a scan Impala will choose one of the tablet's replicas
>>> to read from, so you don't need to "reserve" a core for the other
>>> replicas.
>>> >> can someone clarify if this recommendation below - does it mean
>>> physical or hyper-threaded CPU cores? quite a big difference...
>>> I think this refers to hyper-threaded CPU cores (i.e. a CPU unit
>>> capable of executing an OS thread). But I'd be curious to hear if your
>>> workload is substantially more or less performant either way.

View raw message