cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine
Date Thu, 25 Jun 2015 16:56:09 GMT


Branimir Lambov commented on CASSANDRA-8099:

Continuing my review with the {{db.filter}} package, apart from {{ColumnFilter.Expression.Serializer.serializedSize}}
which appears to be missing a byte when the messaging version is 3.0, I see no correctness
problems with the code. More unit testing would definitely help, though. Below are some suggestions
that should not be treated as blocking commit.

* The name implies it is something that filters partitions, in fact it applies to rows within
partition. I'd rename this to {{RowFilter}} or, if there's a reason to keep partition in its
name, {{PartitionContentFilter}}.
* {{forPaging}} javadoc is contradictory-- on one hand strictly after, on the other could
be inclusive. Please clarify.
* {{filter(UnfilteredRowIterator)}} is risky, people will call the wrong filter overload by
mistake. Rename?
* {{filter(SliceableUnfilteredRowIterator)}} does not seem to apply {{queriedColumns}} in
either {{NamesPartitionFilter}} or {{SlicePartitionFilter}}. Is this on purpose? If it is,
the method above is not its counterpart. Please document.

* {{clusterings}} should be {{NavigableSet}} to enable reverse traversal without duplication
({{TreeSet}} implements that too). This simplifies other code in file.
* {{toString}} misses a closing bracket.

* JavaDoc: what is 'seekTo'? 'slice'?

* ELT_SELECTION is not easily decipherable. Why not just ELEMENT instead? And 'element' elsewhere?
(elt also)

* {{includes/newTester}}: It appears these should only be called if the columns filter is
already applied. This is not obvious; I'd state it in JavaDoc and/or assert that {{columns.contains(cell.column())}}.

* {{DISTINCT_NONE}} is not the same as {{CQLLimits.distinct(MAX_VALUE)}}. Would you add a
comment to say why it is so?
* {{countCells}} should be {{countsCells}}.
* {{CQLLimits.isDistinct}} does not appear to be used for anything.

Throughout, there are lots of easy-to-fix (just add <?>) raw type warnings. I would
also use a virtual method instead of an instance field for {{kind}} everywhere, as done in

I've uploaded a branch that includes fixes to a few of the above.

> Refactor and modernize the storage engine
> -----------------------------------------
>                 Key: CASSANDRA-8099
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0 beta 1
>         Attachments: 8099-nit
> The current storage engine (which for this ticket I'll loosely define as "the code implementing
the read/write path") is suffering from old age. One of the main problem is that the only
structure it deals with is the cell, which completely ignores the more high level CQL structure
that groups cell into (CQL) rows.
> This leads to many inefficiencies, like the fact that during a reads we have to group
cells multiple times (to count on replica, then to count on the coordinator, then to produce
the CQL resultset) because we forget about the grouping right away each time (so lots of useless
cell names comparisons in particular). But outside inefficiencies, having to manually recreate
the CQL structure every time we need it for something is hindering new features and makes
the code more complex that it should be.
> Said storage engine also has tons of technical debt. To pick an example, the fact that
during range queries we update {{SliceQueryFilter.count}} is pretty hacky and error prone.
Or the overly complex ways {{AbstractQueryPager}} has to go into to simply "remove the last
query result".
> So I want to bite the bullet and modernize this storage engine. I propose to do 2 main
> # Make the storage engine more aware of the CQL structure. In practice, instead of having
partitions be a simple iterable map of cells, it should be an iterable list of row (each being
itself composed of per-column cells, though obviously not exactly the same kind of cell we
have today).
> # Make the engine more iterative. What I mean here is that in the read path, we end up
reading all cells in memory (we put them in a ColumnFamily object), but there is really no
reason to. If instead we were working with iterators all the way through, we could get to
a point where we're basically transferring data from disk to the network, and we should be
able to reduce GC substantially.
> Please note that such refactor should provide some performance improvements right off
the bat but it's not it's primary goal either. It's primary goal is to simplify the storage
engine and adds abstraction that are better suited to further optimizations.

This message was sent by Atlassian JIRA

View raw message