kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wong <aw...@cloudera.com>
Subject Re: Help me understand Kudu scalability limitations
Date Wed, 29 Nov 2017 18:06:49 GMT
Hi Boris,

The recommendations listed indicate what has been tested. Going beyond that
is uncharted territory, although that isn't to say it can't be done!

This sort of planning depends on what your schemas look like. Without that,
it's hard to gauge how many tablets are needed for your tables. That would
then guide the number of tablets you could hold total.

In terms of space, it seems like the number of nodes would provide ample
space (30 nodes * 8TB per node >> 80-100TB), unless I'm missing something.
Although given the number of HDDs per node, it sounds like a lot would go
unused. If you meant that you have 3 nodes, that's a different story. Would
you mind clarifying?


On Tue, Nov 28, 2017 at 7:25 AM, Boris Tyukin <boris@boristyukin.com> wrote:

> Hi guys,
> I was really excited about Kudu until I saw this:
> https://kudu.apache.org/docs/known_issues.html
>    -
>    Recommended maximum amount of stored data, post-replication and
>    post-compression, per tablet server is 8TB.
>    -
>    Recommended maximum number of tablets per tablet server is 2000,
>    post-replication.
>    -
>    Maximum number of tablets per table for each tablet server is 60,
>    post-replication, at table-creation time.
> These numbers are very concerning to me because the project I am working
> on will have 300+ plus tables and 20 tables have over 1B rows, 50-100
> tables are 200M rows in average and the rest are below 50M rows. I want to
> see if I can build near real-time data lake, ingesting data from our source
> rdbms systems.
> My cluster is 30 nodes cluster with 12 spinning HDDs each (each drive is
> 8Tb) and each node is 2 CPU 22 core beast with 512Gb of DDR4 RAM.
> Does these limitations above still apply in my case? Looks like I can only
> have 24Tb worth of data in Kudu which is way below that I need. My modest
> estimate is 80-100Tb.
> Also concerned that I can only have 20,000 tablets after replication - as
> I mentioned above I am going to have a bunch of tables with lots of rows.
> I do not have an option to pick a different hardware configuration for our
> cluster.
> thanks

Andrew Wong

View raw message