cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Levi <...@indeni.com>
Subject Re: Using materialized view or AllowFiltering which one is better ?
Date Mon, 09 Oct 2017 18:56:34 GMT
Thanks Crisan .
I understand what you're saying. But according to your suggestion I will
have a record for every entry while I am interested only on the last entry
. So the proposed solution is actually keeping much more data then needed .

On Oct 9, 2017 8:40 PM, "Valentina Crisan" <valentina.crisan@gmail.com>
wrote:

Allow filtering is almost never the answer, especially when you want to do
a full table scan ( there might be some cases where the query is limited to
a partition and allow filtering could be used). And you would like to run
this query every minute - thus extremely good performance is required.
Allow filtering basically brings locally in your coordinator the whole
table content and performs local filtering of the data before answering
your query. Performance wise is not recommended to use such an
implementation.

For a query running every minute you need to address it in one partition
read (according to Cassandra data modeling rules) and that can be done with
denormalization ( manually or materialized views). As far as I know and
also from the discussions in this list MV should be used still with caution
in production environments. Thus, the best option in my opinion is manual
denormalization of data, building a table with partition key last_seen and
clustering key username and adding/updating data accordingly. Furthermore
last_seen I understand it's a value of any time/hour of day - you could
consider building partitions per day: partition key  = (last_seen, day),
primary key = ((last_seen,day),username)).

Valentina

On Mon, Oct 9, 2017 at 1:13 PM, Avi Levi <avi@indeni.com> wrote:

> Hi
>
> I have the following table:
>
> CREATE TABLE users (
>     username text,
>     last_seen bigint,
>     PRIMARY KEY (username)
> );
>
> where* last_seen* is basically the writetime . Number of records in the
> table is aprox 10 million. Insert is pretty much straightforward insert
> into users (username, last_seen) VALUES ([username], now)
>
> I want to make some processing on users that were not seen for the past
> XXX (where xxx can be hours/days ... ) by query the last_seen column
> (this query runs every minute) e.g :
>
> select username from users where last_seen < (now - 1 day).
>
> I have two options as I see it:
>
>    1. use materialized view :
>
> CREATE MATERIALIZED VIEW users_last_seen AS
> SELECT last_seen, username
> FROM users
> WHERE last_seen IS NOT NULL
> PRIMARY KEY (last_seen, username);
>
>
> and simply query:
>
> select username from users_last_seen where last_seen < (now - 1 day)
>
>    1.
>
>    query the users table
>
>    select username from users where last_seen < (now - 1 day) ALLOW
>    FILTERING
>
> which one is more efficient? any other options ?
>
> Any help will be greatly appreciated
>
> Best
>
> Avi
>

Mime
View raw message