cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6875) CQL3: select multiple CQL rows in a single partition using IN
Date Wed, 30 Apr 2014 14:53:16 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985587#comment-13985587
] 

Sylvain Lebresne commented on CASSANDRA-6875:
---------------------------------------------

The overall principle looks good, but I feel this could use some more comments and/or made
a little more clear. Mainly, the {{isMultiColumn}} path in {{SelectStatement.buildBound}}
looks weird at face value: we're inside a loop that populates a {{ColumnNameBuilder}}, but
the {{isMultiColumn}} path completely ignores both the iterated object and the builder. This
work because it relies on the fact that when there is a multi-column restriction then it's
the only one restriction (which is duplicated in the {{SelectStatement.columnRestrictions}}
array), and that the value from such restriction is not a single column value but rather a
fully built composite serialized value (which is not self evident from the method naming in
particular).  But it's hard to piece it all that together when you look at {{buildBound}}
currently.  Some comments would help, but I'd prefer going even further and move the {{isMultiColumn}}
path outside of the loop (by looking up first if the first restriction in {{columnRestrictions}}
is a multi-column one or not) since it has no reason to be in the loop. In fact, I'd go a
tad further by making SelectStatement abstract and have 2 subClass, one for single-column
restrictions with a {{SingleColumnRelation[] columnRestrictions}} array field as we have now,
and one for multi-column that has just one non-array {{MultiColumnRestriction columnsRestriction}}
field. After all, both cases exclude one another in the current implementation.

Somewhat related, I'm slightly afraid that the parts about multi-column restrictions returning
fully serialized composites (through Tuples.Value.get()) will not play nice with the 2.1 code,
where we don't manipulate composites as opaque ByteBuffers anymore (concretely, Tuples will
serialize the composite, but SelectStatement will have to deserialize it back right away to
get a Composite, which will be both ugly and inefficient).  So to avoid having to change everything
on merge, I think it would be cleaner to make Tuples.Value return a list of (individual column)
values instead of just one, and let SelectStatement build back the full composite name using
a ColumnNameBuilder.  Especially if you make the per-column and multi-column paths a tad more
separated as suggested above, I suspect it might clarify things a bit.  

Other than that, a bunch of more minor comments and nits:
* The {{SelectStatement.RawStatement.prepare()}} re-org patch breaks proper indentation at
places (for instance, the indentation of parameters to 'new ColumnSpecification' in the first
branch of udpateSingleColumnRestriction, though there is a few other places). Would be nice
to fix those.
* Can't we use {{QueryProcessor.instance.getPrepared}} instead of creating a only-for-test
{{QueryProcessor.staticGetPrepared}}? Or at the very least leave such shortcuts in the tests
where it belongs.
* In Tuples.Literal.prepare, I'd prefer good ol'fashion indexed loop to iterate over 2 lists
(feels clearer, and saves the allocator creation as a bonus).
* In Tuples.Raw.makeInReceiver should probably be called makeReceiver (it's not related to
"IN"). I'd also drop the spaces in the string generated (if only for consistency with the
string generated in INRaw). As a side node, Raw.makeReceiver uses indexed iteration while
INRaw.makeInReceiver don't, can't we make both consistent style wise for OCD sakes?
* Why make methods of CQLStatement abstract (it's an interface)?  Also, I'd rather add the
QueryOptions parameter to the existing executeInternal and default to QueryOptions.DEFAULT
when calling it, rather than having 2 methods. Though tbh, my preference would be to move
the tests to dtest and leave those somewhat unrelated changes to another ticket, see below.
* SingleColumnRelation.previousInTuple is now unused but not removed.
* We could save one list allocation (instead of both toCreate and toUpdate) in SelectStatement.updateRestrictionsForRelation
(for EQ and IN, we know it can only be a create, and for slices we can lookup with getExisting).
* In Restriction, the {{values}} method is already at top-level, no reason to re-declare it
for EQ.
* It bothers me to start adding unit tests for CQL queries when all of our CQL tests are currently
in dtest. I'd *much* rather keep it all in the dtests to avoid the confusion on where is what
tested.


> CQL3: select multiple CQL rows in a single partition using IN
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-6875
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6875
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>            Reporter: Nicolas Favre-Felix
>            Assignee: Tyler Hobbs
>            Priority: Minor
>             Fix For: 2.0.8
>
>
> In the spirit of CASSANDRA-4851 and to bring CQL to parity with Thrift, it is important
to support reading several distinct CQL rows from a given partition using a distinct set of
"coordinates" for these rows within the partition.
> CASSANDRA-4851 introduced a range scan over the multi-dimensional space of clustering
keys. We also need to support a "multi-get" of CQL rows, potentially using the "IN" keyword
to define a set of clustering keys to fetch at once.
> (reusing the same example\:)
> Consider the following table:
> {code}
> CREATE TABLE test (
>   k int,
>   c1 int,
>   c2 int,
>   PRIMARY KEY (k, c1, c2)
> );
> {code}
> with the following data:
> {code}
>  k | c1 | c2
> ---+----+----
>  0 |  0 |  0
>  0 |  0 |  1
>  0 |  1 |  0
>  0 |  1 |  1
> {code}
> We can fetch a single row or a range of rows, but not a set of them:
> {code}
> > SELECT * FROM test WHERE k = 0 AND (c1, c2) IN ((0, 0), (1,1)) ;
> Bad Request: line 1:54 missing EOF at ','
> {code}
> Supporting this syntax would return:
> {code}
>  k | c1 | c2
> ---+----+----
>  0 |  0 |  0
>  0 |  1 |  1
> {code}
> Being able to fetch these two CQL rows in a single read is important to maintain partition-level
isolation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message