cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-847) Make the reading half of compactions memory-efficient
Date Fri, 05 Mar 2010 23:33:27 GMT


Jonathan Ellis commented on CASSANDRA-847:

Some high-level thoughts:


please increase your column width to 120, and space around operators (n-1 -> n - 1) :)

Bloom filters

"one huge BF" is still a bad idea.  you're cramming more into a single BF than it can usefully
handle.  You remember CASSANDRA-790 of course.  Throwing columns into the same BF as row keys
means that (a) your estimation of how big a BF you'll need gets drastically less accurate
in the worst case and (b) you can either support many less rows, or have a much less accurate
filter because of capacity problems.

furthermore, the more I think about this, the less I think "access column X by name that doesn't
actually exist" is a frequent operation.  usually if you are accessing columns by name the
column names are uniform across your rows and will exist close to 100% of the time.  and if
you are accessing columns by slice then BF is useless.

Put another way, the row key is not just another level of column name and deserves special
treatment at least in this respect.

[the one exception may be if you are accessing rows whose contents have been deleted, but
whose tombstones haven't been GC'd.  we should make sure we don't actually have a BF entry
for a row unless it actually contains data. I don't think the current code does this.]


the Scanner api seems like a step back from IteratingRow to me.  self-contained iterators
are good.  any time you get more complicated than "here's an object I call next() on" things
get buggy in my experience.  even more confusing, scanners can return IR (but you're not supposed
to use it as an iterator?  or you are?  not sure).

telling bad sign: CompactionIterator is 2x as long as it used to be.

I have some thoughts on this but I am going to save this here, typing long things in JIRA
is risky. :)

> Make the reading half of compactions memory-efficient
> -----------------------------------------------------
>                 Key: CASSANDRA-847
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>         Attachments: 0001-Add-structures-that-were-important-to-the-SSTableSca.patch,
0002-Implement-most-of-the-new-SSTableScanner-interface.patch, 0003-Rename-RowIndexedReader-specific-test.patch,
0004-Improve-Scanner-tests-and-separate-SuperCF-handling-.patch, 0005-Add-Scanner-interface-and-a-Filtered-implementation-.patch,
> This issue is the next on the road to finally fixing CASSANDRA-16. To make compactions
memory efficient, we have to be able to perform the compaction process on the smallest possible
chunks that might intersect and contend one-another, meaning that we need a better abstraction
for reading from SSTables.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message