incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhu Han <>
Subject Re: split large sstable
Date Mon, 21 Nov 2011 15:15:05 GMT
best regards,
韩竹(Zhu Han)

坚果铺子 <>, 最简单易用的云存储
同步文件, 分享照片, 文档备份!

On Mon, Nov 21, 2011 at 11:07 PM, Dan Hendry <>wrote:

> Pretty sure your argument about indirect blocks making large files
> inefficient only pertains to ext2/3 and not ext4. It seems ext4 replaces
> the
> 'indirect block' approach with extents
> (
> ,

> I was not aware of this difference in the file systems and it seems to be a
> compelling reason ext4 should be chosen (over ext3) for Cassandra - at
> least
> when using size tiered compaction.

An alternative is XFS, which is also extent based.

> Dan
> -----Original Message-----
> From: Radim Kolar []
> Sent: November-19-11 19:42
> To:
> Subject: Re: split large sstable
> Dne 17.11.2011 17:42, Dan Hendry napsal(a):
> > What do you mean by ' better file offset caching'? Presumably you mean
> > 'better page cache hit rate'?
> fs metadata used to find blocks in smaller files are cached better.
> Large files are using indirect blocks and you need more reads to find
> correct block during seek syscall. For example if large file is using 3
> indirect levels, you need 3xdisk seek to find correct block.
> ocks-in-unix-file-systems/
> Metadata caching in OS is far worse then file caching - one "find /"
> will effectively nullify metadata cache.
> If cassandra could use raw storage. it will eliminate fs overhead and it
> could be over 100% faster on reads because fragmentation will be an
> exception - no need to design fs like FAT or UFS where designers expects
> files to be stored in non continuous area on disk.  Implementing
> something log based like - will be enough.
> Cleaning will not be much needed because compaction will clean it
> naturally.
> > Perhaps what you are actually seeing is row fragmentation across your
> > SSTables? Easy to check with nodetool cfhistograms (SSTables column).
> i have 1.5% hitrate to 2 sstables and 3% to hit 3 sstables. Its pretty
> low with min. compaction set to 5, i will probably set it to 6.
> I would really like to see tests with user defined sizes and file counts
> used for tiered compaction because it work best if you do not leave
> largest file alone in bucket. If your data in cassandra are not growing,
> it can be better fine tuned. i havent done experiments with it but maybe
> max sstable size defined per cf will be enough. Lets say i have 5 GB
> data per CF - ideal setting will be max sstable size to slightly less
> then 1 GB. Cassandra will not keep old data stuck in one 4 GB compacted
> sstable waiting for other 4 GB sstables to be created before compaction
> will remove old data.
> > To answer your question, I know of no tools to split SSTables. If you
> want
> > to switch compaction strategies, levelled compaction (1.0.x) creates many
> > smaller sstables instead of fewer, bigger ones.
> I dont use levelled compaction, it compacts too often. It might get
> better if it can be tuned how many and how large files to use at each
> level. But i will try to switch to levelled compaction and back again it
> might do what i want.
> No virus found in this incoming message.
> Checked by AVG -
> Version: 9.0.920 / Virus Database: 271.1.1/4029 - Release Date: 11/20/11
> 14:34:00

View raw message