Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <CAKGGDjZsCVYWNOVLNmLLUCDqkK=HPEpg-_MZAvw+aBNEk1GjEQ@mail.gmail.com>
References: <CAGDFUBm5Xa-rGcJEvjgn=xTs5VerbZ5jMrhjMErEfMNKPS-uNQ@mail.gmail.com>
 <CAKGGDjZsCVYWNOVLNmLLUCDqkK=HPEpg-_MZAvw+aBNEk1GjEQ@mail.gmail.com>
From: Avi Levi <avi@indeni.com>
Date: Mon, 9 Oct 2017 21:56:34 +0300
Message-ID: <CAGDFUB=8dWgQHdM1nZ4i+2ciX_TZNjoJ6BRBrbB1G+odfdfO+A@mail.gmail.com>
Subject: Re: Using materialized view or AllowFiltering which one is better ?
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary="001a113cc55cde2fa5055b21bf8d"
archived-at: Mon, 09 Oct 2017 18:56:42 -0000

--001a113cc55cde2fa5055b21bf8d
Content-Type: text/plain; charset="UTF-8"

Thanks Crisan .
I understand what you're saying. But according to your suggestion I will
have a record for every entry while I am interested only on the last entry
. So the proposed solution is actually keeping much more data then needed .

On Oct 9, 2017 8:40 PM, "Valentina Crisan" <valentina.crisan@gmail.com>
wrote:

Allow filtering is almost never the answer, especially when you want to do
a full table scan ( there might be some cases where the query is limited to
a partition and allow filtering could be used). And you would like to run
this query every minute - thus extremely good performance is required.
Allow filtering basically brings locally in your coordinator the whole
table content and performs local filtering of the data before answering
your query. Performance wise is not recommended to use such an
implementation.

For a query running every minute you need to address it in one partition
read (according to Cassandra data modeling rules) and that can be done with
denormalization ( manually or materialized views). As far as I know and
also from the discussions in this list MV should be used still with caution
in production environments. Thus, the best option in my opinion is manual
denormalization of data, building a table with partition key last_seen and
clustering key username and adding/updating data accordingly. Furthermore
last_seen I understand it's a value of any time/hour of day - you could
consider building partitions per day: partition key  = (last_seen, day),
primary key = ((last_seen,day),username)).

Valentina

On Mon, Oct 9, 2017 at 1:13 PM, Avi Levi <avi@indeni.com> wrote:

> Hi
>
> I have the following table:
>
> CREATE TABLE users (
>     username text,
>     last_seen bigint,
>     PRIMARY KEY (username)
> );
>
> where* last_seen* is basically the writetime . Number of records in the
> table is aprox 10 million. Insert is pretty much straightforward insert
> into users (username, last_seen) VALUES ([username], now)
>
> I want to make some processing on users that were not seen for the past
> XXX (where xxx can be hours/days ... ) by query the last_seen column
> (this query runs every minute) e.g :
>
> select username from users where last_seen < (now - 1 day).
>
> I have two options as I see it:
>
>    1. use materialized view :
>
> CREATE MATERIALIZED VIEW users_last_seen AS
> SELECT last_seen, username
> FROM users
> WHERE last_seen IS NOT NULL
> PRIMARY KEY (last_seen, username);
>
>
> and simply query:
>
> select username from users_last_seen where last_seen < (now - 1 day)
>
>    1.
>
>    query the users table
>
>    select username from users where last_seen < (now - 1 day) ALLOW
>    FILTERING
>
> which one is more efficient? any other options ?
>
> Any help will be greatly appreciated
>
> Best
>
> Avi
>

--001a113cc55cde2fa5055b21bf8d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div>Thanks Crisan .</div><div dir=3D"auto">I understand =
what you&#39;re saying. But according to your suggestion I will have a reco=
rd for every entry while I am interested only on the last entry . So the pr=
oposed solution is actually keeping much more data then needed .<br><div cl=
ass=3D"gmail_extra" dir=3D"auto"><br><div class=3D"gmail_quote">On Oct 9, 2=
017 8:40 PM, &quot;Valentina Crisan&quot; &lt;<a href=3D"mailto:valentina.c=
risan@gmail.com">valentina.crisan@gmail.com</a>&gt; wrote:<br type=3D"attri=
bution"><blockquote class=3D"quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Allow filtering is almost=
 never the answer, especially when you want to do a full table scan ( there=
 might be some cases where the query is limited to a partition and allow fi=
ltering could be used). And you would like to run this query every minute -=
 thus extremely good performance is required. Allow filtering basically bri=
ngs locally in your coordinator the whole table content and performs local =
filtering of the data before answering your query. Performance wise is not =
recommended to use such an implementation.=C2=A0<div><br><div>For a query r=
unning every minute you need to address it in one partition read (according=
 to Cassandra data modeling rules) and that can be done with denormalizatio=
n ( manually or materialized views). As far as I know and also from the dis=
cussions in this list MV should be used still with caution in production en=
vironments. Thus, the best option in my opinion is manual denormalization o=
f data, building a table with partition key last_seen and clustering key us=
ername and adding/updating data accordingly. Furthermore last_seen I unders=
tand it&#39;s a value of any time/hour of day - you could consider building=
 partitions per day: partition key=C2=A0 =3D (last_seen, day),=C2=A0 primar=
y key =3D ((last_seen,day),username)).=C2=A0 =C2=A0 =C2=A0=C2=A0</div><font=
 color=3D"#888888"><div><br></div><div>Valentina=C2=A0 =C2=A0=C2=A0</div></=
font></div></div><div class=3D"elided-text"><div class=3D"gmail_extra"><br>=
<div class=3D"gmail_quote">On Mon, Oct 9, 2017 at 1:13 PM, Avi Levi <span d=
ir=3D"ltr">&lt;<a href=3D"mailto:avi@indeni.com" target=3D"_blank">avi@inde=
ni.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"=
ltr"><div class=3D"m_-9033331082437517081m_2307546191289529033gmail-post-te=
xt" style=3D"color:rgb(36,39,41);font-size:13px;font-variant-numeric:inheri=
t;margin:0px 0px 5px;padding:0px;border:0px;font-stretch:inherit;line-heigh=
t:1.3;vertical-align:baseline;width:660px;word-wrap:break-word"><p style=3D=
"font-family:inherit;font-size:inherit;font-style:inherit;font-variant:inhe=
rit;font-weight:inherit;margin:0px 0px 1em;padding:0px;border:0px;font-stre=
tch:inherit;line-height:inherit;vertical-align:baseline;clear:both">Hi=C2=
=A0</p><p style=3D"font-family:inherit;font-size:inherit;font-style:inherit=
;font-variant:inherit;font-weight:inherit;margin:0px 0px 1em;padding:0px;bo=
rder:0px;font-stretch:inherit;line-height:inherit;vertical-align:baseline;c=
lear:both">I have the following table:<br></p><pre style=3D"margin-top:0px;=
margin-bottom:1em;padding:5px;border:0px;font-variant-numeric:inherit;font-=
stretch:inherit;line-height:inherit;font-family:Consolas,Menlo,Monaco,&quot=
;Lucida Console&quot;,&quot;Liberation Mono&quot;,&quot;DejaVu Sans Mono&qu=
ot;,&quot;Bitstream Vera Sans Mono&quot;,&quot;Courier New&quot;,monospace,=
sans-serif;vertical-align:baseline;width:auto;max-height:600px;overflow:aut=
o;background-color:rgb(239,240,241);word-wrap:normal"><code style=3D"margin=
:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-we=
ight:inherit;font-stretch:inherit;line-height:inherit;font-family:Consolas,=
Menlo,Monaco,&quot;Lucida Console&quot;,&quot;Liberation Mono&quot;,&quot;D=
ejaVu Sans Mono&quot;,&quot;Bitstream Vera Sans Mono&quot;,&quot;Courier Ne=
w&quot;,monospace,sans-serif;vertical-align:baseline;white-space:inherit">C=
REATE TABLE users (
    username text,
    last_seen bigint,
    PRIMARY KEY (username)
);</code></pre><p style=3D"font-family:inherit;font-size:inherit;font-varia=
nt:inherit;font-weight:inherit;margin:0px 0px 1em;padding:0px;border:0px;fo=
nt-stretch:inherit;line-height:inherit;vertical-align:baseline;clear:both">=
<span style=3D"font-style:inherit">where</span><i> last_seen</i><span style=
=3D"font-style:inherit"> is basically the writetime . Number of records in =
the table is aprox 10 million. Insert is pretty much straightforward=C2=A0<=
/span><code style=3D"font-style:inherit;margin:0px;padding:1px 5px;border:0=
px;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-heigh=
t:inherit;font-family:Consolas,Menlo,Monaco,&quot;Lucida Console&quot;,&quo=
t;Liberation Mono&quot;,&quot;DejaVu Sans Mono&quot;,&quot;Bitstream Vera S=
ans Mono&quot;,&quot;Courier New&quot;,monospace,sans-serif;vertical-align:=
baseline;background-color:rgb(239,240,241);white-space:pre-wrap">insert int=
o users (username, last_seen) VALUES ([username], now)</code></p><p style=
=3D"font-family:inherit;font-size:inherit;font-style:inherit;font-variant:i=
nherit;font-weight:inherit;margin:0px 0px 1em;padding:0px;border:0px;font-s=
tretch:inherit;line-height:inherit;vertical-align:baseline;clear:both">I wa=
nt to make some processing on users that were not seen for the past XXX (wh=
ere xxx can be hours/days ... ) by query the=C2=A0<code style=3D"margin:0px=
;padding:1px 5px;border:0px;font-style:inherit;font-variant:inherit;font-we=
ight:inherit;font-stretch:inherit;line-height:inherit;font-family:Consolas,=
Menlo,Monaco,&quot;Lucida Console&quot;,&quot;Liberation Mono&quot;,&quot;D=
ejaVu Sans Mono&quot;,&quot;Bitstream Vera Sans Mono&quot;,&quot;Courier Ne=
w&quot;,monospace,sans-serif;vertical-align:baseline;background-color:rgb(2=
39,240,241);white-space:pre-wrap">last_seen</code>=C2=A0column (this query =
runs every minute) e.g :</p><p style=3D"font-family:inherit;font-size:inher=
it;font-style:inherit;font-variant:inherit;font-weight:inherit;margin:0px 0=
px 1em;padding:0px;border:0px;font-stretch:inherit;line-height:inherit;vert=
ical-align:baseline;clear:both"><code style=3D"margin:0px;padding:1px 5px;b=
order:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-=
stretch:inherit;line-height:inherit;font-family:Consolas,Menlo,Monaco,&quot=
;Lucida Console&quot;,&quot;Liberation Mono&quot;,&quot;DejaVu Sans Mono&qu=
ot;,&quot;Bitstream Vera Sans Mono&quot;,&quot;Courier New&quot;,monospace,=
sans-serif;vertical-align:baseline;background-color:rgb(239,240,241);white-=
space:pre-wrap">select username from users where last_seen &lt; (now - 1 da=
y)</code>.</p><p style=3D"font-family:inherit;font-size:inherit;font-style:=
inherit;font-variant:inherit;font-weight:inherit;margin:0px 0px 1em;padding=
:0px;border:0px;font-stretch:inherit;line-height:inherit;vertical-align:bas=
eline;clear:both">I have two options as I see it:</p><ol style=3D"font-vari=
ant-numeric:inherit;margin:0px 0px 1em 30px;padding:0px;border:0px;font-str=
etch:inherit;line-height:inherit;vertical-align:baseline;list-style-positio=
n:initial"><li style=3D"font-variant-numeric:inherit;margin:0px;padding:0px=
;border:0px;font-stretch:inherit;line-height:inherit;vertical-align:baselin=
e;word-wrap:break-word"><font face=3D"inherit">use materialized view :</fon=
t></li></ol><div><pre style=3D"font-style:inherit;font-variant:inherit;font=
-weight:inherit;margin-top:0px;margin-bottom:0px;padding:5px;border:0px;fon=
t-stretch:inherit;line-height:inherit;font-family:Consolas,Menlo,Monaco,&qu=
ot;Lucida Console&quot;,&quot;Liberation Mono&quot;,&quot;DejaVu Sans Mono&=
quot;,&quot;Bitstream Vera Sans Mono&quot;,&quot;Courier New&quot;,monospac=
e,sans-serif;vertical-align:baseline;width:auto;max-height:600px;overflow:a=
uto;background-color:rgb(239,240,241);word-wrap:normal"><code style=3D"marg=
in:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-=
weight:inherit;font-stretch:inherit;line-height:inherit;font-family:Consola=
s,Menlo,Monaco,&quot;Lucida Console&quot;,&quot;Liberation Mono&quot;,&quot=
;DejaVu Sans Mono&quot;,&quot;Bitstream Vera Sans Mono&quot;,&quot;Courier =
New&quot;,monospace,sans-serif;vertical-align:baseline;white-space:inherit"=
>CREATE MATERIALIZED VIEW users_last_seen AS
SELECT last_seen, username
FROM users
WHERE last_seen IS NOT NULL
PRIMARY KEY (last_seen, username);</code></pre></div><span style=3D"font-si=
ze:inherit;font-style:inherit;font-variant-ligatures:inherit;font-variant-c=
aps:inherit;font-weight:inherit"><div class=3D"m_-9033331082437517081m_2307=
546191289529033gmail-post-text" style=3D"font-variant-numeric:inherit;margi=
n:0px 0px 5px;padding:0px;border:0px;font-stretch:inherit;line-height:1.3;v=
ertical-align:baseline;width:660px;word-wrap:break-word"><span style=3D"fon=
t-size:inherit;font-style:inherit;font-variant-ligatures:inherit;font-varia=
nt-caps:inherit;font-weight:inherit"><br></span></div>and simply query:</sp=
an></div><div class=3D"m_-9033331082437517081m_2307546191289529033gmail-pos=
t-text" style=3D"color:rgb(36,39,41);font-size:13px;font-variant-numeric:in=
herit;margin:0px 0px 5px;padding:0px;border:0px;font-stretch:inherit;line-h=
eight:1.3;vertical-align:baseline;width:660px;word-wrap:break-word"><br><p =
style=3D"font-family:inherit;font-size:inherit;font-style:inherit;font-vari=
ant:inherit;font-weight:inherit;margin:0px 0px 1em;padding:0px;border:0px;f=
ont-stretch:inherit;line-height:inherit;vertical-align:baseline;clear:both"=
><code style=3D"margin:0px;padding:1px 5px;border:0px;font-style:inherit;fo=
nt-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inh=
erit;font-family:Consolas,Menlo,Monaco,&quot;Lucida Console&quot;,&quot;Lib=
eration Mono&quot;,&quot;DejaVu Sans Mono&quot;,&quot;Bitstream Vera Sans M=
ono&quot;,&quot;Courier New&quot;,monospace,sans-serif;vertical-align:basel=
ine;background-color:rgb(239,240,241);white-space:pre-wrap">select username=
 from users_last_seen where last_seen &lt; (now - 1 day)</code></p><ol star=
t=3D"2" style=3D"font-family:inherit;font-size:inherit;font-style:inherit;f=
ont-variant:inherit;font-weight:inherit;margin:0px 0px 1em 30px;padding:0px=
;border:0px;font-stretch:inherit;line-height:inherit;vertical-align:baselin=
e;list-style-position:initial"><li style=3D"margin:0px;padding:0px;border:0=
px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch=
:inherit;font-size:inherit;line-height:inherit;font-family:inherit;vertical=
-align:baseline;word-wrap:break-word"><p style=3D"margin:0px 0px 1em;paddin=
g:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inheri=
t;font-stretch:inherit;font-size:inherit;line-height:inherit;font-family:in=
herit;vertical-align:baseline;clear:both">query the users table</p><p style=
=3D"margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inher=
it;font-weight:inherit;font-stretch:inherit;font-size:inherit;line-height:i=
nherit;font-family:inherit;vertical-align:baseline;clear:both"><code style=
=3D"margin:0px;padding:1px 5px;border:0px;font-style:inherit;font-variant:i=
nherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-fa=
mily:Consolas,Menlo,Monaco,&quot;Lucida Console&quot;,&quot;Liberation Mono=
&quot;,&quot;DejaVu Sans Mono&quot;,&quot;Bitstream Vera Sans Mono&quot;,&q=
uot;Courier New&quot;,monospace,sans-serif;vertical-align:baseline;backgrou=
nd-color:rgb(239,240,241);white-space:pre-wrap">select username from users =
where last_seen &lt; (now - 1 day) ALLOW FILTERING</code></p></li></ol><p s=
tyle=3D"font-family:inherit;font-size:inherit;font-style:inherit;font-varia=
nt:inherit;font-weight:inherit;margin:0px 0px 1em;padding:0px;border:0px;fo=
nt-stretch:inherit;line-height:inherit;vertical-align:baseline;clear:both">=
which one is more efficient? any other options ?</p><p style=3D"font-family=
:inherit;font-size:inherit;font-style:inherit;font-variant:inherit;font-wei=
ght:inherit;margin:0px 0px 1em;padding:0px;border:0px;font-stretch:inherit;=
line-height:inherit;vertical-align:baseline;clear:both">Any help will be gr=
eatly appreciated</p><p style=3D"font-family:inherit;font-size:inherit;font=
-style:inherit;font-variant:inherit;font-weight:inherit;margin:0px 0px 1em;=
padding:0px;border:0px;font-stretch:inherit;line-height:inherit;vertical-al=
ign:baseline;clear:both">Best</p><span class=3D"m_-9033331082437517081HOEnZ=
b"><font color=3D"#888888"><p style=3D"font-family:inherit;font-size:inheri=
t;font-style:inherit;font-variant:inherit;font-weight:inherit;margin:0px 0p=
x 1em;padding:0px;border:0px;font-stretch:inherit;line-height:inherit;verti=
cal-align:baseline;clear:both">Avi=C2=A0</p></font></span></div></div>
</blockquote></div><br></div>
</div></blockquote></div><br></div></div></div>

--001a113cc55cde2fa5055b21bf8d--