kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jacky.he@gmail.com" <jacky...@gmail.com>
Subject Re: Re: abnormal high disk I/O rate when upsert into kudu table?
Date Wed, 17 Aug 2016 01:05:52 GMT
Thanks Todd.

Kudu cluster running on centos 7.2, each tablet node has 40 cores, the test table is about
140GB after 3 reps,  and partitioned by hash bucket, I had tried 24 and 120 hash buckets.

I do one test: 
1. Stop all ingestion to the cluster
2. Just randomly upsert 3000 rows once, upsert contains new data row or just updates to exisit
row (updates the whole row, not just updates one or more column)
3. From the CDH monitor dashboard, I see the cluster's disk I/O raising from ~300Mb/s to ~1.5Gb/s,
and get back the ~300Mb/s 30min later or more

I check some of tablet node INFO log, they are always doing compaction, compacting 1~ 100s
of thousands rows.

My question:
1. Are the maintenance manager is rewriting the whole table?  3000 rows upsert once will trigger
a rewriting the whole table?
2. Does the background I/O have impacts to the scan performance.
3. About the number of hash partitioned buckets,  I partitioned the table to 24 or 120 buckets,
what's the difference in upsert and scan performance? and what is the best practices?
4. What is the recommended setting for tablet server memory hard limit?


From: Todd Lipcon
Date: 2016-08-17 01:58
To: user
Subject: Re: abnormal high disk I/O rate when upsert into kudu table?
Hi Jacky,

Answers inline below

On Tue, Aug 16, 2016 at 8:13 AM, jacky.he@gmail.com <jacky.he@gmail.com> wrote:
Dear Kudu Developers, 

I am a new tester for kudu, our kudu cluster has 3+12 nodes, 3 seperated master node and 12
tablet node, 
each node has 128GB memory, and 1 SSD for WAL, 6 1TB SAS for data

we are using CDH 5.7.0 with impala-kudu 2.7.0 and kudu 0.9.1 parcels, we set 16GB memory hard
limit for each tablet node.

Sounds like a good cluster setup. Thanks for providing the details. 

one of our test table is about 80-100 columns and 1 key column, with java client, we can insert/upsert
into the kudu table about 100,000/s
the kudu table has 300m rows, and about 300,000 rows update per day, we also use java client
upsert API to update the rows

we found the kudu cluster maybe encounter abnormal high disk I/O rate, about 1.5-2.0Gb/s,
even we just update 1,000~10,000 rows/s
i would like to know, with our row update frequency, is the cluster high disk rate normal
or not?

Are you upserts randomly spread across the range of rows in the table? If so, then when the
updates flush, they'll trigger compactions of the updates and inserted rows into the existing
data. This will cause, over time, a rewrite of the whole table, in order to incorporate the

This background I/O is run by the "maintenance manager". You can visit http://tablet-server:8050/maintenance-manager
to see a dashboard of currently running maintenance operations such as compactions.

The maintenance manager runs a preset number of threads, so the amount of background I/O you're
experiencing won't increase if you increase the number of upserts.

I'm curious, is the background I/O causing an issue, or just unexpected?

Todd Lipcon
Software Engineer, Cloudera
View raw message