cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11031) MultiTenant : support “ALLOW FILTERING" for First Partition Key
Date Mon, 02 May 2016 10:12:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266326#comment-15266326
] 

Sylvain Lebresne commented on CASSANDRA-11031:
----------------------------------------------

So you're right, this case wasn't handled in CASSANDRA-6377 and we can handle it.

However, it's worth noting that this will be pretty seriously inefficient. In particular,
we'll have to read *all* partitions and cannot use the {{where tenant_id = 'datastax'}} for
speeding up the query in any way (I suppose we could for an ordered partitioner but we strongly
discourage its use for many other reason so we're not gonna optimize for that now).

In particular, regarding:

bq. we can support allow filtering on Partition Key, as far as I know, Partition Key is in
memory, so we can easily filter them, and then read required data from SSTable

I'm not entirely sure what you are referring to, but that's pretty much false: we don't keep
all partition keys in memory. Maybe what you are referring to is that we could do the filtering
early in the pipeline, eliminating keys that don't match the filter directly at the sstable
index stage. And that's true, but it would add quite a bit of complexity (to push the filters
through the sstable code) without making the query really efficient: we would still have to
read every keys. So that I have fair doubts that the ratio complexity added/benefits is good
enough.

Anyway, I'd be fine supporting this through basic filtering (that is, pretty much querying
everything and filtering in {{RowFilter}} as usual) since this is guarded by {{ALLOW FILTERING}}.

But regarding the patch you've attached, a few remarks:
* as this is a new feature and a pretty minor one imo for the reason discussed above, it should
really only go to trunk at this point. This will change the patch substantially.
* even if we support filtering on missing partition key, I really don't see a reason to special
case it only for the first one. The code should be more generic than this.
* we'd obviously need the patch to have some testing coverage to consider it.

As a side note regarding our process, we don't assign specific "fix version" until commit
so please leave "3.x" for now.


> MultiTenant : support “ALLOW FILTERING" for First Partition Key
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-11031
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11031
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL
>            Reporter: ZhaoYang
>            Assignee: ZhaoYang
>             Fix For: 3.x
>
>         Attachments: CASSANDRA-11031.patch
>
>
> Currently, Allow Filtering only works for secondary Index column or clustering columns.
And it's slow, because Cassandra will read all data from SSTABLE from hard-disk to memory
to filter.
> But we can support allow filtering on Partition Key, as far as I know, Partition Key
is in memory, so we can easily filter them, and then read required data from SSTable.
> This will similar to "Select * from table" which scan through entire cluster.
> CREATE TABLE multi_tenant_table (
> 	tenant_id text,
> 	pk2 text,
> 	c1 text,
> 	c2 text,
> 	v1 text,
> 	v2 text,
> 	PRIMARY KEY ((tenant_id,pk2),c1,c2)
> ) ;
> Select * from multi_tenant_table where tenant_id = "datastax" allow filtering;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message