cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.
Date Sat, 31 Mar 2012 17:42:26 GMT


Sylvain Lebresne commented on CASSANDRA-4093:

Wow, that seems to get so out of proportion and I so don't understand why.

What happpened is this: currently (i.e. in all released version), a ColumnDefinition name
refers to the CF column name, i.e. one must use the column comparator to decode it. In all
currently released version, CompositeType is a normal comparator and that rule applies to
it, and for anyone that will use thrift in 1.1+, for all intent and purposes CompositeType
will still be a normal comparator like all the other ones and so the natural thing will still
be that the ColumDefinition name applies to CT comparator entirely.

Now when I wrote the CQL3 patch (the initial one), I realised that we needed a new feature,
the need to be able to have ColumnDefinitions whose name refers to only one of the component
of the CT comparator. And while  we currently only need to refer to the last component (because
we don't yet support secondary indexes on CT), we *will* need to have ColumnDefinition whose
name refer to *any* of the CT components as soon as we support secondary indexes on CT (CASSANDRA-3680).
In other words, the introduction of composite_index (this patch) is just the first part CASSANDRA-3680.
We *will* have to add it soon enough. It is *hardly* something added *just* for backward compatibility.

Anyway while I was writting the CQL3 patch, instead of properly handling that new case, I
introduced a *bug* consisting of changing the behavior of ColumnDefinition for CT so that
by default they refer to the last component of the CT, not the comparator itself anymore.
I.e. I changed the default behavior to an incompatible one.  I'm sorry I did that, I shouldn't
have and the goal of that patch is to restore the default behavior and introduce a new info
to ColumnDefinition to allow switching to the new behavior. Again, I think it is incorrect
to say that this patch is just 'in the name of backward compatibility'. If I had mistakefully
changed the subcolumn comparator of super column to apply to column names instead of super
columns names, we wouldn't call fixing that 'supporting a misfeature in the name of backward
compatibilty'. But yes, this bug happens to break backward compatibility and I do have a big
problem with that part.

On the thrift side. Yes this does add a new field to ColumnDef. But:
# It's untrue that column_aliases and value_alias were 'added before we had cqlsh'. They would
be added in the exact same version as this patch, so I think that using that argument against
component_index is unfair.
# I'm not sure this will be so very confusing to users. Again, for thrift users, CT is a comparator
like any other. The confusing thing would be for a ColumnDef to apply to the last component
rather than the full CT. Why would CT be suddenly special (again, on the thrift side) and
why the last component? It would random and thus confusing. On the contrary, if you have component_index,
then you can say that the default is the one you'd expect, i.e the one that apply to all other
comparator, but that we've added the new ability to make ColumnDef apply to other component.
# As soon as CASSANDRA-3680 is done, it will be usefull to allow creating secondary indexes
on CT component on the thrift side. Why wouldn't we allow CASSANDRA-3680 on the thrift if
it only cost us the addition of a simple int field in ColumnDef?

The upgrade path requires some effort but is conceptually simple:

# update your application to no longer use the CT column index
# upgrade

My problem is that step 1 will be *super* painfull. Because before upgrading, you don't have
any reasonable alternative for whatever you were doing with the CT column index (tagging rows
for instance). So you have to write manual code to do the same manual indexing. And then you
have to write a map reduce job to reconstruct this new manual index for existing rows. In
real life situations where you must do that without downtime, it's a pain in the ass.

And yes I know, you don't believe anyone uses that. But I have the weakness to think that
we do not know what everyone user is doing. And yes, I have a *big* problem with screwing
up even 1 user (especially after we've said we were serious about not breaking the thrift
API). And yes, even if that force us to write a little bit more code now and/or later.

bq. is worth the upside of getting to a clean data model in 1.2 without the distractions of
legacy features like this. The danger is that the longer we preserve features like this, the
more potential there is for new users to start using them which makes it more difficult for
us to drop them later

Ok, I understand the argument in principle but in practice what is it we're talking about?
 There is 0 line of specific code to support index on a full CT column name, this is all handled
by the current secondary index code transparently.  So worst case scenario, more people use
the feature (which btw would suggest it's vaguely useful). But then adding support in CQL3
would be fairly trivial. The only thing we'd have to add is way to declare those index (again
the rest is already handled by generic code), and that would likely be a nature extension
of whatever we come up for CASSANDRA-3782. Typically, to reuse one of the proposed syntax
on that issue, on top of supporting:
CREATE INDEX index_name ON timeline(0);
we would have to also add support for
CREATE INDEX index_name ON timeline(0, 'foobar', 4);
Again, this is hardly making the data mode in 1.2 dirty. Actually I'm not even sure I understand
in what way that would hinder the compression of the data model one bit. And again, we don't
*have* to support it if we don't want to. But at least, instead of forcing a painful upgrade
on those that were using index on full CT from thrift when *we* decide, they would have the
option to stay with thrift until they are ready to make the migration ().

Overall, I completely disagree that this patch will force anything on us in the future and
I just don't see where it make the data model for 1.2 less clean.  I also strongly think we
should be more considerate about breaking use cases that people may be using, even if we're
talking very few people.

> schema_* CFs do not respect column comparator which leads to CLI commands failure.
> ----------------------------------------------------------------------------------
>                 Key: CASSANDRA-4093
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>    Affects Versions: 1.1.0
>            Reporter: Dave Brosius
>            Assignee: Sylvain Lebresne
>             Fix For: 1.1.0
>         Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch
> ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize schema_*
CFs column_metadata do not respect CF comparator and use ByteBufferUtil.bytes(...) for column
names which creates problems in CLI and probably in other places.
> The CompositeType validator throws exception on first column
> String columnName = columnNameValidator.getString(;
> Because it appears the composite type length header is wrong (25455)
> AbstractCompositeType.getWithShortLength
> java.lang.IllegalArgumentException
> 	at java.nio.Buffer.limit(
> 	at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(
> 	at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(
> 	at org.apache.cassandra.db.marshal.AbstractCompositeType.getString(
> 	at org.apache.cassandra.cli.CliClient.describeColumnFamily(
> 	at org.apache.cassandra.cli.CliClient.describeKeySpace(
> 	at org.apache.cassandra.cli.CliClient.executeShowKeySpaces(
> (seen in trunk)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message