cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tsuraan <tsur...@gmail.com>
Subject Re: Cassandra behaviour
Date Mon, 26 Jul 2010 21:40:15 GMT
> It's reading through keys in the index and adding offset information
> about roughly every 128th entry in RAM, in order to speed up reads.
> Performing a binary search in an sstable from scratch would be
> expensive. Because of the high cost of disk seeks, most storage
> systems use btrees with a high branching factor to keep the number of
> seeks low. In cassandra there is instead binary searching (owing to
> the fact that sstables are sorted on disk), but pre-seeded with the
> information gained from index sampling to keep the amount of seeks
> bounded even in the face of very large sstables.

That makes sense.  Lucene does the same thing, although it has a
parameter on the IndexReader "open" function that lets you specify how
many terms to skip.  For huge indices on limited machines, that has
been an occasional lifesaver :)

> Those settings only directly affect, as far as I know, the interaction
> with the commit log. Now, if your system is truly disk bound rather
> than CPU bound on compaction, writes to the commit log will indeed
> have the capability to effectively throttle the write speed. In such a
> case I would expect more frequent fsync():s to the commit log to
> throttle writes to a higher degree than they would if the commit log
> was just periodically fsync():ed in the background once per minute;
> however I would not use this as the means to throttle writes.
>
> The other thing which may happen is that memtables aren't flushed fast
> enough to keep up with writes. I don't remember whether or not there
> was already a fix for this; I think there is, at least in trunk.
> Previously you could trigger an out-of-memory condition by writing
> faster than memtable flushing was happening.
>
> However even if that is fixed (again, I'm not sure), I'm pretty sure
> there is still no mechanism to throttle based on background
> compaction. It's not entirely trivial to do in a sensible fashion
> given how extremely asynchronous compaction is with respect to writes.

So userspace throttling is probably the answer?  Is the normal way of
doing this to go through the JMX interface from a userspace program,
and hold off on inserts until the values fall below a given threshold?
 If so, that's going to be a pain, since most of my system is
currently using python :)

Mime
View raw message