cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kurt Greaves (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-13857) Allow MV with same PK but a different filtering
Date Fri, 08 Sep 2017 03:35:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kurt Greaves updated CASSANDRA-13857:
-------------------------------------
    Description: 
We currently disallow creation of a view that has the exact same primary key as the base where
no clustering keys are present, however a potential use case would be a view where part of
the PK is filtered so as to have a subset of data in the view which is faster for range queries.
We actually currently allow this, but only if you have a clustering key defined. If you only
have a partitioning key it's not possible.

>From the mailing list, the below example works:
{code:java}
CREATE TABLE users (
  site_id int,
  user_id text,
  n int,
  data set<frozen<text>>,
  PRIMARY KEY ((site_id, user_id), n));

user data is updated and read by PK and sometimes I have to fetch all user for some specific
site_id. It appeared that full scan by token(site_id,user_id) filtered by WHERE site_id =
<some id> works much slower than unfiltered full scan on
CREATE MATERIALIZED VIEW users_1 AS
SELECT site_id, user_id, n, data
FROM users
WHERE site_id = 1 AND user_id IS NOT NULL AND n IS NOT NULL
PRIMARY KEY ((site_id, user_id), n);
{code}

However the following does not:
{code:java}
CREATE TABLE users (
site_id int,
user_id text,
data set<text>,
PRIMARY KEY ((site_id, user_id)));

CREATE MATERIALIZED VIEW users_1 AS
SELECT site_id, user_id, data
FROM users
WHERE site_id = 1 AND user_id IS NOT NULL 
PRIMARY KEY ((site_id, user_id));
InvalidRequest: Error from server: code=2200 [Invalid query] message="No columns are defined
for Materialized View other than primary key"
{code}

This is because if the clustering key is empty we assume they've only defined the primary
key in the partition key and we haven't accounted for this use case. 

On that note, we also don't allow the following narrowing of the partition key:
{code}
CREATE TABLE kurt.base (
    id int,
    uid text,
    data text,
    PRIMARY KEY (id, uid)
) 

CREATE MATERIALIZED VIEW kurt.mv2 AS SELECT * from kurt.base where id IS NOT NULL and uid='1'
PRIMARY KEY ((id, uid));
InvalidRequest: Error from server: code=2200 [Invalid query] message="No columns are defined
for Materialized View other than primary key"
{code}
But we do allow the following, which works because there is still a clustering key, despite
not changing the PK.
{code}
CREATE MATERIALIZED VIEW kurt.mv2 AS SELECT * from kurt.base where id IS NOT NULL and uid='1'
PRIMARY KEY (id, uid);
{code}
And we also allow the following, which is a narrowing of the partition key as above, but with
an extra clustering key.
{code}
create table kurt.base3 (id int, uid int, clus1 int, clus2 int, data text, PRIMARY KEY ((id,
uid), clus1, clus2));

CREATE MATERIALIZED VIEW kurt.mv4 AS SELECT * from kurt.base3 where id IS NOT NULL and uid
IS NOT NULL and clus1 IS NOT NULL AND clus2 IS NOT NULL  PRIMARY KEY ((id, uid, clus1), clus2);
{code}

I _think_ supporting these cases is trivial and mostly already handled in the underlying MV
write path, so we might be able to get away with just a simple change of [this condition|https://github.com/apache/cassandra/blob/83822d12d87dcb3aaad2b1e670e57ebef4ab1c36/src/java/org/apache/cassandra/cql3/statements/CreateViewStatement.java#L291].


  was:
We currently disallow creation of a view that has the exact same primary key as the base where
no clustering keys are present, however a potential use case would be a view where part of
the PK is filtered so as to have a subset of data in the view which is faster for range queries.
We actually currently allow this, but only if you have a clustering key defined. If you only
have a partitioning key it's not possible.

>From the mailing list, the below example works:
{code:java}
CREATE TABLE users (
  site_id int,
  user_id text,
  n int,
  data set<frozen<text>>,
  PRIMARY KEY ((site_id, user_id), n));

user data is updated and read by PK and sometimes I have to fetch all user for some specific
site_id. It appeared that full scan by token(site_id,user_id) filtered by WHERE site_id =
<some id> works much slower than unfiltered full scan on
CREATE MATERIALIZED VIEW users_1 AS
SELECT site_id, user_id, n, data
FROM users
WHERE site_id = 1 AND user_id IS NOT NULL AND n IS NOT NULL
PRIMARY KEY ((site_id, user_id), n);
{code}

However the following does not:
{code:java}
CREATE TABLE users (
site_id int,
user_id text,
data set<text>,
PRIMARY KEY ((site_id, user_id)));

CREATE MATERIALIZED VIEW users_1 AS
SELECT site_id, user_id, data
FROM users
WHERE site_id = 1 AND user_id IS NOT NULL 
PRIMARY KEY ((site_id, user_id));
InvalidRequest: Error from server: code=2200 [Invalid query] message="No columns are defined
for Materialized View other than primary key"
{code}

This is because if the clustering key is empty we assume they've only defined the primary
key in the partition key and we haven't accounted for this use case. 

On that note, we also don't allow the following narrowing of the partition key:
{code}
CREATE TABLE kurt.base (
    id int,
    uid text,
    data text,
    PRIMARY KEY (id, uid)
) 

CREATE MATERIALIZED VIEW kurt.mv2 AS SELECT * from kurt.base where id IS NOT NULL and uid='1'
PRIMARY KEY ((id, uid));
{code}
But we do allow the following, which works because there is still a clustering key, despite
not changing the PK.
{code}
CREATE MATERIALIZED VIEW kurt.mv2 AS SELECT * from kurt.base where id IS NOT NULL and uid='1'
PRIMARY KEY (id, uid);
{code}
And we also allow the following, which is a narrowing of the partition key as above
{code}
create table kurt.base3 (id int, uid int, clus1 int, clus2 int, data text, PRIMARY KEY ((id,
uid), clus1, clus2));

CREATE MATERIALIZED VIEW kurt.mv4 AS SELECT * from kurt.base3 where id IS NOT NULL and uid
IS NOT NULL and clus1 IS NOT NULL AND clus2 IS NOT NULL  PRIMARY KEY ((id, uid, clus1), clus2);
{code}

I _think_ supporting these cases is trivial and mostly already handled in the underlying MV
write path, so we might be able to get away with just a simple change of [this condition|https://github.com/apache/cassandra/blob/83822d12d87dcb3aaad2b1e670e57ebef4ab1c36/src/java/org/apache/cassandra/cql3/statements/CreateViewStatement.java#L291].



> Allow MV with same PK but a different filtering
> -----------------------------------------------
>
>                 Key: CASSANDRA-13857
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13857
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Kurt Greaves
>
> We currently disallow creation of a view that has the exact same primary key as the base
where no clustering keys are present, however a potential use case would be a view where part
of the PK is filtered so as to have a subset of data in the view which is faster for range
queries. We actually currently allow this, but only if you have a clustering key defined.
If you only have a partitioning key it's not possible.
> From the mailing list, the below example works:
> {code:java}
> CREATE TABLE users (
>   site_id int,
>   user_id text,
>   n int,
>   data set<frozen<text>>,
>   PRIMARY KEY ((site_id, user_id), n));
> user data is updated and read by PK and sometimes I have to fetch all user for some specific
site_id. It appeared that full scan by token(site_id,user_id) filtered by WHERE site_id =
<some id> works much slower than unfiltered full scan on
> CREATE MATERIALIZED VIEW users_1 AS
> SELECT site_id, user_id, n, data
> FROM users
> WHERE site_id = 1 AND user_id IS NOT NULL AND n IS NOT NULL
> PRIMARY KEY ((site_id, user_id), n);
> {code}
> However the following does not:
> {code:java}
> CREATE TABLE users (
> site_id int,
> user_id text,
> data set<text>,
> PRIMARY KEY ((site_id, user_id)));
> CREATE MATERIALIZED VIEW users_1 AS
> SELECT site_id, user_id, data
> FROM users
> WHERE site_id = 1 AND user_id IS NOT NULL 
> PRIMARY KEY ((site_id, user_id));
> InvalidRequest: Error from server: code=2200 [Invalid query] message="No columns are
defined for Materialized View other than primary key"
> {code}
> This is because if the clustering key is empty we assume they've only defined the primary
key in the partition key and we haven't accounted for this use case. 
> On that note, we also don't allow the following narrowing of the partition key:
> {code}
> CREATE TABLE kurt.base (
>     id int,
>     uid text,
>     data text,
>     PRIMARY KEY (id, uid)
> ) 
> CREATE MATERIALIZED VIEW kurt.mv2 AS SELECT * from kurt.base where id IS NOT NULL and
uid='1' PRIMARY KEY ((id, uid));
> InvalidRequest: Error from server: code=2200 [Invalid query] message="No columns are
defined for Materialized View other than primary key"
> {code}
> But we do allow the following, which works because there is still a clustering key, despite
not changing the PK.
> {code}
> CREATE MATERIALIZED VIEW kurt.mv2 AS SELECT * from kurt.base where id IS NOT NULL and
uid='1' PRIMARY KEY (id, uid);
> {code}
> And we also allow the following, which is a narrowing of the partition key as above,
but with an extra clustering key.
> {code}
> create table kurt.base3 (id int, uid int, clus1 int, clus2 int, data text, PRIMARY KEY
((id, uid), clus1, clus2));
> CREATE MATERIALIZED VIEW kurt.mv4 AS SELECT * from kurt.base3 where id IS NOT NULL and
uid IS NOT NULL and clus1 IS NOT NULL AND clus2 IS NOT NULL  PRIMARY KEY ((id, uid, clus1),
clus2);
> {code}
> I _think_ supporting these cases is trivial and mostly already handled in the underlying
MV write path, so we might be able to get away with just a simple change of [this condition|https://github.com/apache/cassandra/blob/83822d12d87dcb3aaad2b1e670e57ebef4ab1c36/src/java/org/apache/cassandra/cql3/statements/CreateViewStatement.java#L291].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message