cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6561) Static columns in CQL3
Date Tue, 18 Feb 2014 08:02:22 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903863#comment-13903863
] 

Sylvain Lebresne commented on CASSANDRA-6561:
---------------------------------------------

bq. Other than the Nicolas' issue with static columns and 2i

Nicolas' example uses a 2i on a non-static column, which as far as I can tell should work
properly. Creating 2i on static columns is indeed not supported, but as the comment in CreateIndexStatement
says, this require more 2i specific work to support (and could imo easily let user shoot themselves
in the foot) and is left to a later ticket.

bq. I'm on the fence whether or not we should perform the same validation on the cells, too

I'm relatively strongly against creating an inconsistency here. I'm not saying allowing duplicates
for regular batch was a good idea, in fact I think that's an oversight, but I don't think
making a difference because there is one or more condition on the batch or not makes particular
sense (conditions only control whether or not the batch is executed, but other than that a
CAS batch will execute exactly the same way than a regular one).

Note that in theory, I'd be in favor of always refusing duplicates in batches, but obviously
in practice there is the backward compatibility concern and so maybe it's just too late. Or
maybe we could do something like with CASSANDRA-6649, just warning for 1 or 2 versions and
refusing them altogether after that. But in any case, this belong to another issue imo.

bq. There is currently no way to select together both the static and clustered columns of
a CQL row

Yes there is, see https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py#L3542.
The only thing we've disallowed is to select *only* the static columns when clustering columns
are in the where clause, because that strongly suggest you're doing something wrong and you'll
get the right behavior if you just remove the clustering columns from the where clause. This
particular restriction does not forbid any use case in particular. Now maybe there is some
other corner cases that you have in mind and are not properly handled by the patch, but then
please simply provide the exact query that does not work as you think should and I'll look
at it (it's even better if you reuse the example of the dtest above).


> Static columns in CQL3
> ----------------------
>
>                 Key: CASSANDRA-6561
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6561
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.0.6
>
>
> I'd like to suggest the following idea for adding "static" columns to CQL3.  I'll note
that the basic idea has been suggested by jhalliday on irc but the rest of the details are
mine and I should be blamed for anything stupid in what follows.
> Let me start with a rational: there is 2 main family of CF that have been historically
used in Thrift: static ones and dynamic ones. CQL3 handles both family through the presence
or not of clustering columns. There is however some cases where mixing both behavior has its
use. I like to think of those use cases as 3 broad category:
> # to denormalize small amounts of not-entirely-static data in otherwise static entities.
It's say "tags" for a product or "custom properties" in a user profile. This is why we've
added CQL3 collections. Importantly, this is the *only* use case for which collections are
meant (which doesn't diminishes their usefulness imo, and I wouldn't disagree that we've maybe
not communicated this too well).
> # to optimize fetching both a static entity and related dynamic ones. Say you have blog
posts, and each post has associated comments (chronologically ordered). *And* say that a very
common query is "fetch a post and its 50 last comments". In that case, it *might* be beneficial
to store a blog post (static entity) in the same underlying CF than it's comments for performance
reason.  So that "fetch a post and it's 50 last comments" is just one slice internally.
> # you want to CAS rows of a dynamic partition based on some partition condition. This
is the same use case than why CASSANDRA-5633 exists for.
> As said above, 1) is already covered by collections, but 2) and 3) are not (and
> I strongly believe collections are not the right fit, API wise, for those).
> Also, note that I don't want to underestimate the usefulness of 2). In most cases, using
a separate table for the blog posts and the comments is The Right Solution, and trying to
do 2) is premature optimisation. Yet, when used properly, that kind of optimisation can make
a difference, so I think having a relatively native solution for it in CQL3 could make sense.
> Regarding 3), though CASSANDRA-5633 would provide one solution for it, I have the feeling
that static columns actually are a more natural approach (in term of API). That's arguably
more of a personal opinion/feeling though.
> So long story short, CQL3 lacks a way to mix both some "static" and "dynamic" rows in
the same partition of the same CQL3 table, and I think such a tool could have it's use.
> The proposal is thus to allow "static" columns. Static columns would only make sense
in table with clustering columns (the "dynamic" ones). A static column value would be static
to the partition (all rows of the partition would share the value for such column). The syntax
would just be:
> {noformat}
> CREATE TABLE t (
>   k text,
>   s text static,
>   i int,
>   v text,
>   PRIMARY KEY (k, i)
> )
> {noformat}
> then you'd get:
> {noformat}
> INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm shared",       0, "foo");
> INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm still shared", 1, "bar");
> SELECT * FROM t;
>  k |                  s | i |    v
> ------------------------------------
> k0 | "I'm still shared" | 0 | "bar"
> k0 | "I'm still shared" | 1 | "foo"
> {noformat}
> There would be a few semantic details to decide on regarding deletions, ttl, etc. but
let's see if we agree it's a good idea first before ironing those out.
> One last point is the implementation. Though I do think this idea has merits, it's definitively
not useful enough to justify rewriting the storage engine for it. But I think we can support
this relatively easily (emphasis on "relatively" :)), which is probably the main reason why
I like the approach.
> Namely, internally, we can store static columns as cells whose clustering column values
are empty. So in terms of cells, the partition of my example would look like:
> {noformat}
> "k0" : [
>   (:"s" -> "I'm still shared"), // the static column
>   (0:"" -> "")                  // row marker
>   (0:"v" -> "bar")
>   (1:"" -> "")                  // row marker
>   (1:"v" -> "foo")
> ]
> {noformat}
> Of course, using empty values for the clustering columns doesn't quite work because it
could conflict with the user using empty clustering columns. But in the CompositeType encoding
we have the end-of-component byte that we could reuse by using a specific value (say 0xFF,
currently we never set that byte to anything else than -1, 0 and 1) to indicate it's a static
column.
> With that, we'd need to update the CQL3 statements to support the new syntax and rules,
but that's probably not horribly hard.
> So anyway, this may or may not be a good idea, but I think it has enough meat to warrant
some consideration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message