cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Read Latency Degradation
Date Sat, 18 Dec 2010 15:20:44 GMT
On Sat, Dec 18, 2010 at 5:27 AM, Peter Schuller
<> wrote:
> And I forgot:
> (6) It is fully expected that sstable counts spike during large
> compactions that take a lot of time simply because smaller compactions
> never get a chance to run. (There was just recently JIRA traffic that
> added support for parallel compaction, but I'm not sure whether it
> fully addresses this particular issue or not.) If you have a lot rows
> that are written incrementally and thus span multiple sstables, and
> your data size is truly large and written to fairly quickly, that
> means you will have a lot of data in sstables spread out over smaller
> ones that won't get compacted for extended periods once larger
> multi-hundreds-of-gig sstables are being compacted. However, that
> said, if you are just continually increasing your sstable count
> (rather than there just being spikes) that indicates compaction is not
> keeping up with write traffic.
> --
> / Peter Schuller

+1 on each of Peter's points except one.

For example, if the hot set is very small and slowly changing, you may
be able to have 100 TB per node and take the traffic without any

Also this page

On ext2/ext3 the maximum file size is 2TB, even on a 64 bit kernel. On
ext4 that goes up to 16TB. Since Cassandra can use almost half your
disk space on a single file, if you are raiding large disks together
you may want to use XFS instead, particularly if you are using a
32-bit kernel. XFS file size limits are 16TB max on a 32 bit kernel,
and basically unlimited on 64 bit.

Both of these statements imply there should be no challenges to use
disks this large. But there are challenges namely the ones mentioned
in this thread
1) Bloom filters currently stop being effective
2) If you have small columns the start up time for a node (0.6.0)
would be mind boggling to sample those indexes
3) The compaction scenarios that take a long time and cause sstable
build thus lowering read performance
4) node joins/moves/repairs take a long time (due to compaction taking
a long time)

We should be careful not to mislead people. Talking about 16TB XFS
setup, or 100TB/node without any difficulties , seems very very far
from the common use case.

View raw message