kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Berkeley <wdberke...@gmail.com>
Subject Re: Why RowSet size is much smaller than flush_threshold_mb
Date Fri, 15 Jun 2018 15:26:48 GMT
The op seen in the logs is a rowset compaction, which takes existing
diskrowsets and rewrites them. It's not a flush, which writes data in
memory to disk, so I don't think the flush_threshold_mb is relevant. Rowset
compaction is done to reduce the amount of overlap of rowsets in primary
key space, i.e. reduce the number of rowsets that might need to be checked
to enforce the primary key constraint or find a row. Having lots of rowset
compaction indicates that rows are being written in a somewhat random order
w.r.t the primary key order. Kudu will perform much better as writes scale
when rows are inserted roughly in increasing order per tablet.

Also, because you are using the log block manager (the default and only one
suitable for production deployments), there isn't a 1-1 relationship
between cfiles or diskrowsets and files on the filesystem. Many cfiles and
diskrowsets will be put together in a container file.

Config parameters that might be relevant here:
--maintenance_manager_num_threads
--fs_data_dirs (how many)
--fs_wal_dir (is it shared on a device with the data dir?)

The metrics from the compact row sets op indicates the time is spent in
fdatasync and in reading (likely reading the original rowsets). The overall
compaction time is kinda long but not crazy long. What's the performance
you are seeing and what is the performance you would like to see?

-Will

On Fri, Jun 15, 2018 at 7:52 AM, Quanlong Huang <huang_quanlong@126.com>
wrote:

> Hi all,
>
> I'm running kudu 1.6.0-cdh5.14.2. When looking into the logs of tablet
> server, I find most of the compactions are compacting small files (~40MB
> for each). For example:
>
> I0615 07:22:42.637351 30614 tablet.cc:1661] T
> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7:
> Compaction: stage 1 complete, picked 4 rowsets to compact
> I0615 07:22:42.637385 30614 compaction.cc:903] Selected 4 rowsets to
> compact:
> I0615 07:22:42.637393 30614 compaction.cc:906] RowSet(343)(current size
> on disk: ~40666600 bytes)
> I0615 07:22:42.637401 30614 compaction.cc:906] RowSet(1563)(current size
> on disk: ~34720852 bytes)
> I0615 07:22:42.637408 30614 compaction.cc:906] RowSet(1645)(current size
> on disk: ~29914833 bytes)
> I0615 07:22:42.637415 30614 compaction.cc:906] RowSet(1870)(current size
> on disk: ~29007249 bytes)
> I0615 07:22:42.637428 30614 tablet.cc:1447] T
> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7:
> Compaction: entering phase 1 (flushing snapshot). Phase 1 snapshot:
> MvccSnapshot[committed={T|T < 6263071556616208384 or (T in
> {6263071556616208384})}]
> I0615 07:22:42.641582 30614 multi_column_writer.cc:103] Opened CFile
> writers for 124 column(s)
> I0615 07:22:43.875396 30614 multi_column_writer.cc:103] Opened CFile
> writers for 124 column(s)
> I0615 07:22:44.418421 30614 multi_column_writer.cc:103] Opened CFile
> writers for 124 column(s)
> I0615 07:22:45.114389 30614 multi_column_writer.cc:103] Opened CFile
> writers for 124 column(s)
> I0615 07:22:54.762563 30614 tablet.cc:1532] T
> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7:
> Compaction: entering phase 2 (starting to duplicate updates in new rowsets)
> I0615 07:22:54.773572 30614 tablet.cc:1587] T
> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7:
> Compaction Phase 2: carrying over any updates which arrived during Phase 1
> I0615 07:22:54.773599 30614 tablet.cc:1589] T
> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7:
> Phase 2 snapshot: MvccSnapshot[committed={T|T < 6263071556616208384 or (T
> in {6263071556616208384})}]
> I0615 07:22:55.189757 30614 tablet.cc:1631] T
> 6bdefb8c27764a0597dcf98ee1b450ba P 70f3e54fe0f3490cbf0371a6830a33a7:
> Compaction successful on 82987 rows (123387929 bytes)
> I0615 07:22:55.191426 30614 maintenance_manager.cc:491] Time spent
> running CompactRowSetsOp(6bdefb8c27764a0597dcf98ee1b450ba): real 12.628s user
> 1.460s sys 0.410s
> I0615 07:22:55.191484 30614 maintenance_manager.cc:497] P
> 70f3e54fe0f3490cbf0371a6830a33a7: CompactRowSetsOp(
> 6bdefb8c27764a0597dcf98ee1b450ba) metrics: {"cfile_cache_hit":812,"cfile_
> cache_hit_bytes":16840376,"cfile_cache_miss":2730,"cfile_
> cache_miss_bytes":251298442,"cfile_init":496,"data
> dirs.queue_time_us":6646,"data dirs.run_cpu_time_us":2188,"data
> dirs.run_wall_time_us":101717,"fdatasync":315,"fdatasync_us"
> :9617174,"lbm_read_time_us":1288971,"lbm_reads_1-10_ms":
> 32,"lbm_reads_10-100_ms":41,"lbm_reads_lt_1ms":4641,"lbm_
> write_time_us":122520,"lbm_writes_lt_1ms":2799,"mutex_
> wait_us":25,"spinlock_wait_cycles":155264,"tcmalloc_
> contention_cycles":768,"thread_start_us":677,"threads_
> started":14,"wal-append.queue_time_us":300}
>
> The flush_threshold_mb is set in the default value (1024). Wouldn't the
> flushed file size be ~1GB?
>
> I think increasing the initial RowSet size can reduce compactions and then
> reduce the impact of other ongoing operations. It may also improve the
> flush performance. Is that right? If so, how can I increase the RowSet size?
>
> I'd be grateful if someone can make me clear about these!
>
> Thanks,
> Quanlong
>

Mime
View raw message