cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sheng Chen <>
Subject Re: Endless minor compactions after heavy inserts
Date Sun, 03 Apr 2011 17:46:24 GMT
I think if i can keep a single sstable file in a proper size, the hot
data/index files may be able to fit into memory at least in some occasions.

In my use case, I want to use cassandra for storage of a large amount of log
There will be multiple nodes, and each node has 10*2TB disks to hold as much
data as possible, ideally 20TB (about 100 billion rows) in one node.
Reading operations will be much less than writing. A reading latency within
1 second is acceptable.

Is it possible? Do you have advice on this design?
Thank you.


2011/4/3 aaron morton <>

> With only one data file your reads would use the least amount of IO to find
> the data.
> Most people have multiple nodes and probably fewer disks, so each node may
> have a TB or two of data. How much capacity do your 10 disks give ? Will you
> be running multiple nodes in production ?
> Aaron
> On 2 Apr 2011, at 12:45, Sheng Chen wrote:
> Thank you very much.
> The major compaction will merge everything into one big file., which would
> be very large.
> Is there any way to control the number or size of files created by major
> compaction?
> Or, is there a recommended number or size of files for cassandra to handle?
> Thanks. I see the trigger of my minor compaction is OperationsInMillions.
> It is a number of operations in total, which I thought was in a second.
> Cheers,
> Sheng
> 2011/4/1 aaron morton <>
>> If you are doing some sort of bulk load you can disable minor compactions
>> by setting the min_compaction_threshold and max_compaction_threshold to 0 .
>> Then once your insert is complete run a major compaction via nodetool before
>> turning the minor compaction back on.
>> You can also reduce the compaction threads priority, see
>> compaction_thread_priority in the yaml file.
>> The memtable will be flushed when either the MB or ops throughput is
>> triggered. If you are seeing a lot of memtables smaller than the MB
>> threshold then the ops threshold is probably been triggered. Look for a log
>> message at INFO level starting with "Enqueuing flush of Memtable" that will
>> tell you how many bytes and ops the memtable had when it was flushed. Trying
>> increasing the ops threshold and see what happens.
>> You're change in the compaction threshold may not have an an effect
>> because the compaction process was already running.
>> AFAIK the best way to get the best out of your 10 disks will be to use a
>> dedicated mirror for the commit log and a  stripe set for the data.
>> Hope that helps.
>> Aaron
>> On 1 Apr 2011, at 14:52, Sheng Chen wrote:
>> > I've got a single node of cassandra 0.7.4, and I used the java stress
>> tool to insert about 100 million records.
>> > The inserts took about 6 hours (45k inserts/sec) but the following minor
>> compactions last for 2 days and the pending compaction jobs are still
>> increasing.
>> >
>> > From jconsole I can read the MemtableThroughputInMB=1499,
>> MemtableOperationsInMillions=7.0
>> > But in my data directory, I got hundreds of 438MB data files, which
>> should be the cause of the minor compactions.
>> >
>> > I tried to set compaction threshold by nodetool, but it didn't seem to
>> take effects (no change in pending compaction tasks).
>> > After restarting the node, my setting is lost.
>> >
>> > I want to distribute the read load in my disks (10 disks in xfs, LVM),
>> so I don't want to do a major compaction.
>> > So, what can I do to keep the sstable file in a reasonable size, or to
>> make the minor compactions faster?
>> >
>> > Thank you in advance.
>> > Sheng
>> >

View raw message