cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-847) Make the reading half of compactions memory-efficient
Date Sat, 06 Mar 2010 18:38:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842290#action_12842290
] 

Jonathan Ellis commented on CASSANDRA-847:
------------------------------------------

Let's keep this simple.

The goal is to create an abstraction that (a) compaction code can apply to both old and new
data formats, while (b) allowing for memory-efficient compactions on the new format and (c)
making new-format indexing of subcolumns possible and (d) ideally allowing up to 256 levels
of new-format subcolumn nesting.

In particular, the goal does not include improving efficiency of compaction of old format
data; if that falls out naturally, fine, but it's not really our goal.

Nor is it yet our goal to support the new format, in this patchset, although maybe it should
be.  Compacting from old format to old format, with new data structures, is not part of our
ultimate goal either, and making it an intermediate step may be making things harder than
necessary.  It may be simpler to introduce the new format first, so we can skip to compacting
from old -> new and new -> new, not bothering with old -> old.

Are we on the same page?

I think the simplest way to get to this is to simply continue using IColumn.  It generalizes
just fine to multiple levels, and the existing implementation knows how to use abstractions
like mostRecentLiveChangeAt to handle tricky problems like tombstones.  Throwing this away
and starting over will lead us eventually to the same place.  [Although certainly some parts
like getObjectCount won't be needed and can ultimately be removed.]  Also, sharing code b/t
old and new formats is within reason a good thing.  So let's keep IColumn (I believe the analogue
in your patch is Named?) and Column.

ColumnFamily + SuperColumn should be replaced with a more generalized structure supporting
arbitrary nesting.  Here I think ColumnGroup is a better name than Slice; we use the latter
term in querying, which would be potentially confusing.  But I think it would have a lot in
common w/ the existing CF/SuperColumn code.  Each ColumnGroup, like Column, only needs a byte[]
name.  No need to copy a lot of full paths around; experience with existing code shows that
this is unnecessary.

Mapping this to the old data format is hopefully clear since it resembles it relatively strongly.
 What about the new format?  Here we come back to my advocating that "all container information
goes in the block header, followed by serialized Columns [not IColumns, just name-data-ts
triples]."  This is where we will need something like ColumnKey to contain column boundaries
-- i.e., not in this patchset, unless you decide that actually introducing the new format
here is the way to go.

Thus, for compaction, our algorithm goes something like "read all the header information at
once and build the ColumnGroup structure in memory, then iterate through matching sub-columngroups,
merging as necessary."  Since we read the header all at once, and then the subcolumns in-order,
all i/o within a single sstable remains sequential.

It's not clear to me how to apply the old ReducingIterator approach to multilevel groups when
the data to merge into one Block may be spread across multiple Blocks in another sstable,
although I find the iterator design very elegant and easy to confirm correctness in.  So you
are probably right that this has to change.

One other thing about header info / column key: it would be nice to come up with a scheme
that doesn't repeat the full path in the description of each ColumnGroup [i.e., ColumnKey
or its analogue], at least not on-disk; in a heavily nested structure that would be a lot
of duplication of the initial path elements, although presumably compression would mitigate
this some.

What do you think?

> Make the reading half of compactions memory-efficient
> -----------------------------------------------------
>
>                 Key: CASSANDRA-847
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-847
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>         Attachments: 0001-Add-structures-that-were-important-to-the-SSTableSca.patch,
0002-Implement-most-of-the-new-SSTableScanner-interface.patch, 0003-Rename-RowIndexedReader-specific-test.patch,
0004-Improve-Scanner-tests-and-separate-SuperCF-handling-.patch, 0005-Add-Scanner-interface-and-a-Filtered-implementation-.patch,
0006-Add-support-for-compaction-of-super-CFs-and-some-tes.patch
>
>
> This issue is the next on the road to finally fixing CASSANDRA-16. To make compactions
memory efficient, we have to be able to perform the compaction process on the smallest possible
chunks that might intersect and contend one-another, meaning that we need a better abstraction
for reading from SSTables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message