cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3237) refactor super column implmentation to use composite column names instead
Date Fri, 21 Dec 2012 18:27:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538274#comment-13538274
] 

Sylvain Lebresne commented on CASSANDRA-3237:
---------------------------------------------

Attached patches for this at https://github.com/pcmanus/cassandra/commits/3237-1.

This ain't small so I'll try to explain the main idea here.

The main idea is that internally, super column families are handled for almost all intents
and purposes as if their comparator was a simple CompositeType with 2 components: the 1st
one is the old super column name, the 2nd one the old sub-column name. Meaning that they are
largely not a special anymore and all the super column specific code go away (including SuperColumn.java).

Now for compatibility sake, the main action is in the new SuperColumns.java class. This class
contains a bunch of static methods that:
* deserialize old super column format directly into new composite based CF.
* serialize new composite based CF to the old super column format
* convert 'super column query filters' to and from 'composite based query filters'.

Then in ColumnFamilySerializer and the ReadCommand serializer, we use those static methods
when talking to old nodes (and a super column family is involved). We also convert thrift
SC queries into equivalent ones on the new composite format in CassandraServer.java.

The patch also don't shy away from removing abstractions that are not necessary anymore once
super columns are removed. Most notably:
* QueryPath is removed. It was honestly already kind of useless with super columns but even
more so without them. It was also error-prone imho because some method that were taking a
QueryPath were actually ignoring everything except the columnFamilyName for instance. I note
that the class itself is not removed but kept only to simplify wire compatibility with old
nodes.
* IColumn and IColumnContainer are removed.

We could also merge ColumnFamily and AbstractColumnContainer but I've left that to later.

As far as testing goes:
* the unit tests pass more or less. There's CassandraServerTest that timeout on my box, but
it does so on trunk too (seems to be the JVM that don't exit properly). And there's also a
few serializationTest failing but it seems to be more related to the fact that the patch bumps
the messaging version up that anything else. I'll look at that later.
* our old functional tests (in test/system) pass. Again, there is a few failure, but those
are test that are assuming CollatingOrderedPartitioner (apparently nobody ran those tests
in a while). Anyway, those tests test the thrift API for super columns fairly thorougly.
* you can now access super column family from CQL3.
* I've also (briefly) tested wire compatibily and that you can do super columns queries in
a mixed version cluster.

Regarding the CQL3 support, SCF for which column_metadata has been defined on the subcolumn
are handled almost like sparse CF. The almost is because I've made sure we don't write row
marker as in the case of sparse CF, cause that would break backward compatibility (there is
no way to have a column with an empty name in a super column). For the same reason, collection
are not supported either.

One small downside that I need to note is that during upgrade from 1.2 to 2.0, there might
be a noticeable latency increase in super column queries. The reason is that any read query
that mix pre and post SC nodes will have a digest mismatch (and so will re-query with the
full data). Indeed, digest are not versioned and cannot really be (not easily at least).
                
> refactor super column implmentation to use composite column names instead
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3237
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3237
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Matthew F. Dennis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 2.0
>
>         Attachments: cassandra-supercolumn-irc.log
>
>
> super columns are annoying.  composite columns offer a better API and performance.  people
should use composites over super columns.  some people are already using super columns.  C*
should implement the super column API in terms of composites to reduce code, complexity and
testing as well as increase performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message