Not sure why the first paragraph turned in to a numbered bullet...

On Sun, May 2, 2010 at 11:00 AM, James Golick <jamesgolick@gmail.com> wrote:
  1. I wrote the list a while back about less-than-great performance when reading thousands of columns even on cache hits. Last night, I decided to try to get to the bottom of why.

I tested this by setting the row cache capacity on a TimeUUIDType-sorted CF to 10, filling up a single row with 2000 columns, and only running queries against that row. That row was the only thing in the database. I rm -Rf'd the data before starting the test.

The tests were done from Coda Hale's scala client cassie, which is just a thin layer around the java thrift bindings. I didn't actually time each call because that wasn't the objective, but I didn't really need to. Reads of 10 columns felt quick enough, but 100 columns was slower. 1000 columns would frequently cause the client to timeout. The cache hit rate on that CF was 1.0, so, yes, the row was in cache.

Doing a thousand reads with count=100 in a single thread pegged my macbook's CPU and caused the fans to spin up pretty loud.

So, I attached a profiler and repeated the test. I'm no expert on cassandra internals, so please let me know if I'm way off here. The profiled reads were reversed=true, count=100.

As far as I can tell, there are three components taking up most of the time on this type of read (row slice out of cache):
  1. ColumnFamilystore.removeDeleted() @ ~40% - Most of the time in here is actually spent materializing UUID objects so that they can be compared in the ConcurrentSkipListMap (ColumnFamily.columns_).
  2. SliceQueryFilter.getMemColumnIterator @ ~30% - Virtually all the time in here is spent in ConcurrentSkipListMap$Values.toArrray()
  3. QueryFilter.collectCollatedColumns @ ~30% - All the time being spent in ColumnFamily.addColumn, and about half of the total spent materializing UUIDs for comparison.
This profile is consistent with the decrease in performance with higher values for count. If there are more UUIDs to deserialize, the performance of removeDeleted(), and collectCollatedColumns() should increase (roughly) linearly.

So, my question at this point is how to fix it. I have some basic ideas, but being new to cassandra internals, I'm not sure they make any sense. Help me out here:
  1. Optionally call removeDeleted() less often. I realize that this is probably a bad idea for a lot of reasons, but it was the first thing I thought of.
  2. When a ColumnFamily object is put in to the row cache, copy the columns over to another data structure that doesn't need to be sorted on get(). If columns_ needs to be kept around, this option would have a memory impact, but at least for us, it'd be well worth it for the speed.
  3. ????
I'd love to hear feedback on these / the rest of this (long) post.