Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3D8B5200B8B for ; Tue, 4 Oct 2016 19:10:38 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3BF04160ACC; Tue, 4 Oct 2016 17:10:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D88E2160AC7 for ; Tue, 4 Oct 2016 19:10:36 +0200 (CEST) Received: (qmail 98677 invoked by uid 500); 4 Oct 2016 17:10:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 98667 invoked by uid 99); 4 Oct 2016 17:10:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2016 17:10:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id F041E180227 for ; Tue, 4 Oct 2016 17:10:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=jaumo.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id H_UbhAuZgA1F for ; Tue, 4 Oct 2016 17:10:30 +0000 (UTC) Received: from mail-oi0-f41.google.com (mail-oi0-f41.google.com [209.85.218.41]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id ED19D5F2ED for ; Tue, 4 Oct 2016 17:10:29 +0000 (UTC) Received: by mail-oi0-f41.google.com with SMTP id r126so242078368oib.0 for ; Tue, 04 Oct 2016 10:10:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jaumo.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=AZ+gK6RS+SBfWzKOLuwjyuG/9k5Za3MLzxWqpMymCLo=; b=VxyMT5BtfeDQVXpeCyPKPq2/vUJaVy1MH2PWnmTI4NqEug0RFsuKUeGJJ6l+a+vxJa /d+NYj+r+MdL5oo7ag5NfWrlWlZb98WwQVzBGdGHtjYdN3sjBE+EsYX7v5GT/yaMYnvV GIDnjYifCxVu6z2eLCptYX2PF+NOqRJW6LVwI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=AZ+gK6RS+SBfWzKOLuwjyuG/9k5Za3MLzxWqpMymCLo=; b=Etr0YrGNhp279SaYq29ZkSN6fVo0O+VLJUzSWKwsSCFC/9N39pwe/2oOIxUkI8Pw5s Wk3Wn2O02poKQyy0D6MEzOFWpjzn2ol0Rkun6Oyr3HXOxPp+jA02TdtqoSgsGlyn9Z5R IIia8tdGyQ7zdKbJLZd3YZRq0HbkOWVVOqMMYYKxHrItpy3a+lZd3vWdIzEB7PCaV/Ut m74uO2t4wGz/kLNcp626ZbcM9m10nJn1tc+/SeSePjHaxKIdbKX6yhh9Ct+keoyyb0PC AJXO0+z0qYSxMNSV45AQdp/HlnDbwCf/5xVZnCAKF3ZtWtmxjv7BPYkldedGB8i5lPBu BoAQ== X-Gm-Message-State: AA6/9RlLR1IyE4C9Tu897j5iFpFbNXNXtG8KYWVhSwsvIse5MuIZw39e+Cq8L9uEMqTvf9APSshwd5ZEYj9iZ8TS X-Received: by 10.202.169.199 with SMTP id s190mr4104760oie.144.1475601028482; Tue, 04 Oct 2016 10:10:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.148.104 with HTTP; Tue, 4 Oct 2016 10:10:27 -0700 (PDT) In-Reply-To: References: <15790799379.ce19ce2d65072.2511543805842625341@winguzone.com> <157908502dc.ce4c6f7a65657.4927653253734338014@winguzone.com> From: Benjamin Roth Date: Tue, 4 Oct 2016 19:10:27 +0200 Message-ID: Subject: Re: Efficient model for a sorting To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a113ce69622492f053e0d23ec archived-at: Tue, 04 Oct 2016 17:10:38 -0000 --001a113ce69622492f053e0d23ec Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I started off with 3.0.6 and for my personal use case(s) they had the same bugs as tick tock. 2016-10-04 19:03 GMT+02:00 Jonathan Haddad : > I strongly recommend avoiding tick tock. You'll be one of the only people > putting it in prod and will likely hit a number of weird issues nobody wi= ll > be able to help you with. > On Tue, Oct 4, 2016 at 12:40 PM Benjamin Roth > wrote: > >> I have the impression, that not the tick-tock is the real problem but MV= s >> are not really battle-tested yet. >> Depending on the model, they put much more complexity on a cluster and >> it's behaviour under heavy load. Especially if you are going to create a= n >> MV with a different partition key than the base table this might be a sh= ot >> in the head. >> At least I was able to bring my cluster down many times just by throwing >> a few queries too much at it or by running some big repairs with reaper. >> Only since some days, things seem to go smoothly after having struggled >> about 2 months with very different kind of issues. >> >> We'll see ... most probably I will stick with the latest version. After >> all it seems to work ok, I gained a lot of experience in running and >> troubleshooting and to deal with bugs and maybe I am so able to contribu= te >> a bit to further development. >> >> 2016-10-04 18:26 GMT+02:00 Vladimir Yudovin : >> >> >Would you consider 3.0.x to be more stable than 3.x? >> I guess yes, but there are some discussion on this list: >> >> (C)* stable version after 3.5 >> >> Upgrade from 3.0.6 to 3.7. >> >> >> It seems to be eternal topic till tick-tock approach stabilizes. >> >> >> Best regards, Vladimir Yudovin, >> >> >> *Winguzone Inc - Hosted Cloud Cassan= dra >> on Azure and SoftLayer.Launch your cluster in minutes.* >> >> >> ---- On Tue, 04 Oct 2016 12:19:13 -0400 *Benjamin >> Roth>* wrote ---- >> >> I use the self-compiled master (3.10, ticktock). I had to fix a severe >> bug on my own and decided to go with the latest code. >> Would you consider 3.0.x to be more stable than 3.x? >> >> 2016-10-04 18:14 GMT+02:00 Vladimir Yudovin : >> >> Hi Benjamin! >> >> >we now use CS 3.x and have been advised that 3.x is still not considere= d >> really production ready. >> >> Did you consider using of 3.0.9? Actually it's 3.0 with almost an year >> fixes. >> >> >> Best regards, Vladimir Yudovin, >> >> >> *Winguzone Inc - Hosted Cloud Cassan= dra >> on Azure and SoftLayer.Launch your cluster in minutes.* >> >> >> ---- On Tue, 04 Oct 2016 07:27:54 -0400 *Benjamin Roth >> >* wrote ---- >> >> Hi! >> >> I have a frequently used pattern which seems to be quite costly in CS. >> The pattern is always the same: I have a unique key and a sorting by a >> different field. >> >> To give an example, here a real life example from our model: >> CREATE TABLE visits.visits_in ( >> user_id int, >> user_id_visitor int, >> created timestamp, >> PRIMARY KEY (user_id, user_id_visitor) >> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC) >> >> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS >> SELECT user_id, created, user_id_visitor >> FROM visits.visits_in >> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visito= r >> IS NOT NULL >> PRIMARY KEY (user_id, created, user_id_visitor) >> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC) >> >> This simply represents people, that visited my profile sorted by date >> desc but only one entry per visitor. >> Other examples with the same pattern could be a whats-app-like inbox >> where the last message of each sender is shown by date desc. There are l= ots >> of examples for that pattern. >> >> E.g. in redis I'd just use a sorted set, where the key could be like >> "visits_${user_id}", set key would be user_id_visitor and score >> the created timestamp. >> In MySQL I'd create the table with PK on user_id + user_id_visitor and >> create an index on user_id + created >> In C* i use an MV. >> >> Is this the most efficient approach? >> I also could have done this without an MV but then the situation in our >> app would be far more complex. >> I know that denormalization is a common pattern in C* and I don't >> hesitate to use it but in this case, it is not as simple as it's not an >> append-only case but updates have to be handled correctly. >> If it is the first visit of a user, it's that simple, just 2 inserts in >> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2= nd >> visit has to be deleted from the denormalized table before. Otherwise th= e >> visit would not be unique any more. >> Handling this case without an MV requires a lot more effort, I guess eve= n >> more effort than just using an MV. >> 1. You need kind of app-side locking to deal with race conditions >> 2. Read before write is required to determine if an old record has to be >> deleted >> 3. At least CL_QUORUM is required to make sure that read before write is >> always consistent >> 4. Old record has to be deleted on update >> >> I guess, using an MV here is more efficient as there is less roundtrip >> between C* and the app to do all that and the MV does not require strong >> consistency as MV updates are always local and are eventual consistent w= hen >> the base table is. So there is also no need for distributed locks. >> >> I ask all this as we now use CS 3.x and have been advised that 3.x is >> still not considered really production ready. >> >> I guess in a perfect world, this wouldn't even require an MV if SASI >> indexes could be created over more than 1 column. E.g. in MySQL this cas= e >> is nothing else than a BTree. AFAIK SASI indices are also BTrees, filter= ing >> by Partition Key (which should to be done anyway) and sorting by a field >> would perfectly do the trick. But from the docs, this is not possible ri= ght >> now. >> >> Does anyone see a better solution or are all my assumptions correct? >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH =C2=B7 www.jaumo.com >> Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany >> Phone +49 7161 304880-6 =C2=B7 Fax +49 7161 304880-1 >> AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer >> >> >> >> >> >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH =C2=B7 www.jaumo.com >> Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany >> Phone +49 7161 304880-6 =C2=B7 Fax +49 7161 304880-1 >> AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer >> >> >> >> >> >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH =C2=B7 www.jaumo.com >> Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany >> Phone +49 7161 304880-6 =C2=B7 Fax +49 7161 304880-1 >> AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer >> > --=20 Benjamin Roth Prokurist Jaumo GmbH =C2=B7 www.jaumo.com Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany Phone +49 7161 304880-6 =C2=B7 Fax +49 7161 304880-1 AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer --001a113ce69622492f053e0d23ec Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I started off with 3.0.6 and for my personal use case(s) t= hey had the same bugs as tick tock.

2016-10-04 19:03 GMT+02:00 Jonathan Haddad <jon@jo= nhaddad.com>:
I strongly re= commend avoiding tick tock. You'll be one of the only people putting it= in prod and will likely hit a number of weird issues nobody will be able t= o help you with.
On Tue, Oc= t 4, 2016 at 12:40 PM Benjamin Roth <benjamin.roth@jaumo.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
I have the impression, that not the tick-tock is the real pro= blem but MVs are not really battle-tested yet.
Depending on the model, they put much more complexity on = a cluster and it's behaviour under heavy load. Especially if you are go= ing to create an MV with a different partition key than the base table this= might be a shot in the head.=C2=A0
At least I was able to bring my cluster down many times just b= y throwing a few queries too much at it or by running some big repairs with= reaper.
Only since som= e days, things seem to go smoothly after having struggled about 2 months wi= th very different kind of issues.

We'll see ... most probably I will= stick with the latest version. After all it seems to work ok, I gained a l= ot of experience in running and troubleshooting and to deal with bugs and m= aybe I am so able to contribute a bit to further development.

2016-10-04 18:26 GMT+02:00 Vladimir Yudovin <vlady= u@winguzone.com>:
>Would you consider 3.0.x to be more stabl= e than 3.x?
I guess yes= , but there are some discussion on this list:

It seems to be et= ernal topic till tick-tock approach stabilizes.


Best regards, Vladimir Y= udovin,
Winguzone Inc - = Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.

=

---- On Tue, 04 Oct 2016 12:19:13 -= 0400 Benjamin Roth<benjamin.roth@jaumo.com> wrote ----
I use the self-compiled master (3= .10, ticktock). I had to fix a severe bug on my own and decided to go with = the latest code.
Would you co= nsider 3.0.x to be more stable than 3.x?

2016-10-04 18:14 GMT+02:00 Vladim= ir Yudovin <vladyu@winguzone.com>:
Hi Benjamin!

>we now use CS 3.x and have been advised that = 3.x is still not considered really production ready.

Did you consider using of 3.0.9? Actually it's 3.0 with almost an year= fixes.


Best regards, Vladimir Yudovin, <= br class=3D"m_-7459471428991383437gmail_msg">Winguzone = Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.


=
---- On Tue, 04 Oct 2016 07:2= 7:54 -0400 Benjamin Roth <<= a rel=3D"noreferrer" href=3D"mailto:benjamin.roth@jaumo.com" class=3D"m_-74= 59471428991383437gmail_msg" target=3D"_blank">benjamin.roth@jaumo.com&g= t; wrote ----
Hi!

I have a frequently used pattern which seems to be quite costly = in CS. The pattern is always the same: I have a unique key and a sorting by= a different field.
To give an example, here a real life example from our = model:
CREATE TABLE visits.visits_in (
=C2=A0 =C2=A0 user_id int,
=C2=A0 =C2=A0 user_id_visitor in= t,
=C2=A0 =C2=A0 create= d timestamp,
=C2=A0 =C2= =A0 PRIMARY KEY (user_id, user_id_visitor)
) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
<= /div>

CREATE MATERIALIZED VIEW visits.visits_in_sorted= _mv AS
=C2=A0 =C2=A0 SE= LECT user_id, created, user_id_visitor
=C2=A0 =C2=A0 FROM visits.visits_in
=C2=A0 =C2=A0 WHERE user_id IS NOT NULL AND cr= eated IS NOT NULL AND user_id_visitor IS NOT NULL
=C2=A0 =C2=A0 PRIMARY KEY (user_id, created, use= r_id_visitor)
=C2=A0 = =C2=A0 WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
<= div class=3D"m_-7459471428991383437gmail_msg">
This = simply represents people, that visited my profile sorted by date desc but o= nly one entry per visitor.
Other examples with the same pattern could be a whats-app-like inbox wh= ere the last message of each sender is shown by date desc. There are lots o= f examples for that pattern.

E.g. in redis I'd just use a = sorted set, where the key could be like "visits_${user_id}", set = key would be=C2=A0user_id_visitor=C2=A0and score the=C2=A0created=C2= =A0timestamp.
In MySQL I'd create the table with= PK on user_id + user_id_visitor and create an index on user_id + created
In C* i use an MV.
=

Is t= his the most efficient approach?
I also could have done this without an MV but then the situation = in our app would be far more complex.
I know that denormalization is a common pattern in C* and I = don't hesitate to use it but in this case, it is not as simple as it= 9;s not an append-only case but updates have to be handled correctly.
=
If it is the first visit of = a user, it's that simple, just 2 inserts in base table + denormalized t= able. But on a 2nd or 3rd visit, the 1st or 2nd visit has to be deleted fro= m the denormalized table before. Otherwise the visit would not be unique an= y more.
Handling this c= ase without an MV requires a lot more effort, I guess even more effort than= just using an MV.=C2=A0
1. You need kind of app-side locking to deal with race conditions
2. Read before write is requir= ed to determine if an old record has to be deleted
3. At least CL_QUORUM is required to make sure = that read before write is always consistent
4. Old record has to be deleted on update

I guess, us= ing an MV here is more efficient as there is less roundtrip between C* and = the app to do all that and the MV does not require strong consistency as MV= updates are always local and are eventual consistent when the base table i= s. So there is also no need for distributed locks.

<= /div>
I ask all this as we no= w use CS 3.x and have been advised that 3.x is still not considered really = production ready.

I guess in a perfect world, this wouldn't even requi= re an MV if SASI indexes could be created over more than 1 column. E.g. in = MySQL this case is nothing else than a BTree. AFAIK SASI indices are also B= Trees, filtering by Partition Key (which should to be done anyway) and sort= ing by a field would perfectly do the trick. But from the docs, this is not= possible right now.

Does anyone see a better solution o= r are all my assumptions correct?

--
B= enjamin Roth
Prokurist

Jaumo GmbH =C2=B7 www.jaumo.c= om
Wehrstra=C3=9Fe 46 =C2= =B7 73035 G=C3=B6ppingen =C2=B7 Germany
Phone +49 7161 30488= 0-6 =C2=B7 Fax +49 7161 30= 4880-1
AG Ulm =C2=B7 HRB 7= 31058 =C2=B7 Managing Director: Jens Kammerer

=


--
Benjamin Roth
Prokurist

Jaumo GmbH =C2=B7= www.jaumo.com
Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen = =C2=B7 Germany
Phone +49 7161 304880-6 =C2=B7 Fax= +49 7161 304880-1AG Ulm =C2=B7 HRB 731058 =C2=B7 = Managing Director: Jens Kammerer





--
Benjamin Roth
Prokurist

Jaumo GmbH =C2=B7 www.jaumo.com
= Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany
Phone +49 7161 304880-6 =C2=B7 = Fax +49 7161 304880-1
AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer
<= /div>



--
Be= njamin Roth
Prokurist

Jaumo GmbH =C2=B7 www.jaumo.com
Wehrstra=C3=9Fe 46 =C2=B7 73= 035 G=C3=B6ppingen =C2=B7 Germany
Phone +49 7161 304880-6 =C2=B7 Fax +49= 7161 304880-1
AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens K= ammerer
--001a113ce69622492f053e0d23ec--