Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A92E8200B8B for ; Tue, 4 Oct 2016 18:40:28 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A79C3160AC7; Tue, 4 Oct 2016 16:40:28 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 766BA160AEF for ; Tue, 4 Oct 2016 18:40:27 +0200 (CEST) Received: (qmail 95037 invoked by uid 500); 4 Oct 2016 16:40:25 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 95027 invoked by uid 99); 4 Oct 2016 16:40:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2016 16:40:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 5D907C0E0C for ; Tue, 4 Oct 2016 16:40:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=jaumo.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 7JUbNfq-WQOP for ; Tue, 4 Oct 2016 16:40:19 +0000 (UTC) Received: from mail-oi0-f48.google.com (mail-oi0-f48.google.com [209.85.218.48]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id A910260CEE for ; Tue, 4 Oct 2016 16:40:18 +0000 (UTC) Received: by mail-oi0-f48.google.com with SMTP id m72so70703005oik.3 for ; Tue, 04 Oct 2016 09:40:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jaumo.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=fozKubuDmVLAI8W1fNTUu6STI6pvy5B8QQ8GIupERrQ=; b=QU0iZVyB3M0e6bNiyO1wg8sWxIQ63rHXpiwboYLTQpNM5AHoSIgDbDg/yWP5L7ZAk6 IQUyvZVloRWmCpjov1ba8+ul3Ib1Zq+TddjC/7Lj3Ip3C01FYop1OCZxRvrpapmIhnBh OL5CIQlcReh+Nt54YjX4KglCGZ4BXZJvkB7pQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=fozKubuDmVLAI8W1fNTUu6STI6pvy5B8QQ8GIupERrQ=; b=hoaeaZ0PyS/fch6a4IsJyLYNzsJHNxMRWaueKUTcjCDpWjdDkVnl9FXkU6HPOql4vx Bt56VYrrVjFDy10c7LYCW6rBt3DeCAsg5YvSHZaGH501xMcUtGC5V8sX2nku2ZJawoJM GdunzQVr4PBeYU9IMEff93eP95UoJTscX8i2IHUeFNpzDS4oCXXm69r0R6cV+v/lpGTA y7laIQPwWPWUXH5n39WiZWde6TiumlnkGOrZccOJJ6k5m73PqJZXIfBdzxVVixsksdHm TtDNREk9ftb2zAT+TpNNj70ILeBQIKujNMqONDfJjXq9hcwbHK/4+YRUwVfrxDWNj7XI LLYQ== X-Gm-Message-State: AA6/9RkMyFskPKIHb/A+ASrE+X2/G0RGdeZv2wGGqWOhLLuMCAFgsZiZaqVisv9lkQR5DgfNi7mUQwikCBzGCd8J X-Received: by 10.157.16.27 with SMTP id h27mr2812479ote.192.1475599217337; Tue, 04 Oct 2016 09:40:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.148.104 with HTTP; Tue, 4 Oct 2016 09:40:16 -0700 (PDT) In-Reply-To: <157908502dc.ce4c6f7a65657.4927653253734338014@winguzone.com> References: <15790799379.ce19ce2d65072.2511543805842625341@winguzone.com> <157908502dc.ce4c6f7a65657.4927653253734338014@winguzone.com> From: Benjamin Roth Date: Tue, 4 Oct 2016 18:40:16 +0200 Message-ID: Subject: Re: Efficient model for a sorting To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a113dffc02e417e053e0cb743 archived-at: Tue, 04 Oct 2016 16:40:28 -0000 --001a113dffc02e417e053e0cb743 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I have the impression, that not the tick-tock is the real problem but MVs are not really battle-tested yet. Depending on the model, they put much more complexity on a cluster and it's behaviour under heavy load. Especially if you are going to create an MV with a different partition key than the base table this might be a shot in the head. At least I was able to bring my cluster down many times just by throwing a few queries too much at it or by running some big repairs with reaper. Only since some days, things seem to go smoothly after having struggled about 2 months with very different kind of issues. We'll see ... most probably I will stick with the latest version. After all it seems to work ok, I gained a lot of experience in running and troubleshooting and to deal with bugs and maybe I am so able to contribute a bit to further development. 2016-10-04 18:26 GMT+02:00 Vladimir Yudovin : > >Would you consider 3.0.x to be more stable than 3.x? > I guess yes, but there are some discussion on this list: > > (C)* stable version after 3.5 > > Upgrade from 3.0.6 to 3.7. > > > It seems to be eternal topic till tick-tock approach stabilizes. > > > Best regards, Vladimir Yudovin, > > > *Winguzone Inc - Hosted Cloud Cassand= ra > on Azure and SoftLayer.Launch your cluster in minutes.* > > > ---- On Tue, 04 Oct 2016 12:19:13 -0400 *Benjamin > Roth>* wrote ---- > > I use the self-compiled master (3.10, ticktock). I had to fix a severe bu= g > on my own and decided to go with the latest code. > Would you consider 3.0.x to be more stable than 3.x? > > 2016-10-04 18:14 GMT+02:00 Vladimir Yudovin : > > Hi Benjamin! > > >we now use CS 3.x and have been advised that 3.x is still not considered > really production ready. > > Did you consider using of 3.0.9? Actually it's 3.0 with almost an year > fixes. > > > Best regards, Vladimir Yudovin, > > > *Winguzone Inc - Hosted Cloud Cassand= ra > on Azure and SoftLayer.Launch your cluster in minutes.* > > > ---- On Tue, 04 Oct 2016 07:27:54 -0400 *Benjamin Roth > >* wrote ---- > > Hi! > > I have a frequently used pattern which seems to be quite costly in CS. Th= e > pattern is always the same: I have a unique key and a sorting by a > different field. > > To give an example, here a real life example from our model: > CREATE TABLE visits.visits_in ( > user_id int, > user_id_visitor int, > created timestamp, > PRIMARY KEY (user_id, user_id_visitor) > ) WITH CLUSTERING ORDER BY (user_id_visitor ASC) > > CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS > SELECT user_id, created, user_id_visitor > FROM visits.visits_in > WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor > IS NOT NULL > PRIMARY KEY (user_id, created, user_id_visitor) > WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC) > > This simply represents people, that visited my profile sorted by date des= c > but only one entry per visitor. > Other examples with the same pattern could be a whats-app-like inbox wher= e > the last message of each sender is shown by date desc. There are lots of > examples for that pattern. > > E.g. in redis I'd just use a sorted set, where the key could be like > "visits_${user_id}", set key would be user_id_visitor and score > the created timestamp. > In MySQL I'd create the table with PK on user_id + user_id_visitor and > create an index on user_id + created > In C* i use an MV. > > Is this the most efficient approach? > I also could have done this without an MV but then the situation in our > app would be far more complex. > I know that denormalization is a common pattern in C* and I don't hesitat= e > to use it but in this case, it is not as simple as it's not an append-onl= y > case but updates have to be handled correctly. > If it is the first visit of a user, it's that simple, just 2 inserts in > base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2n= d > visit has to be deleted from the denormalized table before. Otherwise the > visit would not be unique any more. > Handling this case without an MV requires a lot more effort, I guess even > more effort than just using an MV. > 1. You need kind of app-side locking to deal with race conditions > 2. Read before write is required to determine if an old record has to be > deleted > 3. At least CL_QUORUM is required to make sure that read before write is > always consistent > 4. Old record has to be deleted on update > > I guess, using an MV here is more efficient as there is less roundtrip > between C* and the app to do all that and the MV does not require strong > consistency as MV updates are always local and are eventual consistent wh= en > the base table is. So there is also no need for distributed locks. > > I ask all this as we now use CS 3.x and have been advised that 3.x is > still not considered really production ready. > > I guess in a perfect world, this wouldn't even require an MV if SASI > indexes could be created over more than 1 column. E.g. in MySQL this case > is nothing else than a BTree. AFAIK SASI indices are also BTrees, filteri= ng > by Partition Key (which should to be done anyway) and sorting by a field > would perfectly do the trick. But from the docs, this is not possible rig= ht > now. > > Does anyone see a better solution or are all my assumptions correct? > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH =C2=B7 www.jaumo.com > Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany > Phone +49 7161 304880-6 =C2=B7 Fax +49 7161 304880-1 > AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer > > > > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH =C2=B7 www.jaumo.com > Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany > Phone +49 7161 304880-6 =C2=B7 Fax +49 7161 304880-1 > AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer > > > > --=20 Benjamin Roth Prokurist Jaumo GmbH =C2=B7 www.jaumo.com Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany Phone +49 7161 304880-6 =C2=B7 Fax +49 7161 304880-1 AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer --001a113dffc02e417e053e0cb743 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I have the impression, that not the tick-tock is the real = problem but MVs are not really battle-tested yet.
Depending on the mode= l, they put much more complexity on a cluster and it's behaviour under = heavy load. Especially if you are going to create an MV with a different pa= rtition key than the base table this might be a shot in the head.=C2=A0
At least I was able to bring my cluster down many times just by thro= wing a few queries too much at it or by running some big repairs with reape= r.
Only since some days, things seem to go smoothly after having = struggled about 2 months with very different kind of issues.

=
We'll see ... most probably I will stick with the latest ver= sion. After all it seems to work ok, I gained a lot of experience in runnin= g and troubleshooting and to deal with bugs and maybe I am so able to contr= ibute a bit to further development.
<= br>
2016-10-04 18:26 GMT+02:00 Vladimir Yudovin <= span dir=3D"ltr"><vladyu@winguzone.com>:
=
>Would you consider 3.0.x to be more stab= le than 3.x?
I guess yes, but there are some discussion on this l= ist:

= It seems to be eternal t= opic till tick-tock approach stabilizes.


Best regards, Vladimir Yudovin,
Winguzone Inc = - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in = minutes.



---- On Tue, 04 Oct 2016= 12:19:13 -0400 Benjamin Roth<benjamin.roth@jaumo.com> wrote ----
I use the self-= compiled master (3.10, ticktock). I had to fix a severe bug on my own and d= ecided to go with the latest code.
Would you consider 3.0.x to be more = stable than 3.x?

2016-10-04 18:14 GMT+02:00 Vladim= ir Yudovin <vladyu@winguzone.com>:
Hi Benjamin!

>we now use CS 3= .x and have been advised that 3.x is still not considered really production= ready.

Did you consider using of 3.0.9? Actually it's 3.= 0 with almost an year fixes.


Best regards, Vladimir Yudovin= ,
Winguzone Inc - Hosted Cloud Cassandra on Azure and So= ftLayer.
Launch your cluster in minutes.


=
---- On Tue, 04 Oct 2016 07:27:54 -0400 Benjamin Roth <benja= min.roth@jaumo.com> wrote ----
Hi!

I have a f= requently used pattern which seems to be quite costly in CS. The pattern is= always the same: I have a unique key and a sorting by a different field.

To give an example, here a real life example from o= ur model:
CREATE TABLE visit= s.visits_in (
=C2=A0 =C2=A0 user_id int,
=C2=A0 =C2=A0 = user_id_visitor int,
=C2=A0 =C2=A0 created timestamp,
= =C2=A0 =C2=A0 PRIMARY KEY (user_id, user_id_visitor)
) WITH CLUST= ERING ORDER BY (user_id_visitor ASC)

CREATE MATERIALIZED VIE= W visits.visits_in_sorted_mv AS
=C2=A0 =C2=A0 SELECT user_id, cre= ated, user_id_visitor
=C2=A0 =C2=A0 FROM visits.visits_in
=C2=A0 =C2=A0 WHERE user_id IS NOT NULL AND created IS NOT NULL AND user= _id_visitor IS NOT NULL
=C2=A0 =C2=A0 PRIMARY KEY (user_id, creat= ed, user_id_visitor)
=C2=A0 =C2=A0 WITH CLUSTERING ORDER BY (crea= ted DESC, user_id_visitor DESC)

This simply repres= ents people, that visited my profile sorted by date desc but only one entry= per visitor.
Other examples with the same pattern could be a wha= ts-app-like inbox where the last message of each sender is shown by date de= sc. There are lots of examples for that pattern.

=
E.g. in redis I'd just use a sorted set, where the key could= be like "visits_${user_id}", set key would be=C2=A0user_id_visitor=C2=A0and score the=C2=A0create= d=C2=A0timestamp.
In MySQL I'd create the table with PK o= n user_id + user_id_visitor and create an index on user_id + created
<= div>In C* i use an MV.

Is this the most efficient = approach?
I also could have done this without an MV but then the = situation in our app would be far more complex.
I know that denor= malization is a common pattern in C* and I don't hesitate to use it but= in this case, it is not as simple as it's not an append-only case but = updates have to be handled correctly.
If it is the first visit of= a user, it's that simple, just 2 inserts in base table + denormalized = table. But on a 2nd or 3rd visit, the 1st or 2nd visit has to be deleted fr= om the denormalized table before. Otherwise the visit would not be unique a= ny more.
Handling this case without an MV requires a lot more eff= ort, I guess even more effort than just using an MV.=C2=A0
1. You= need kind of app-side locking to deal with race conditions
2. Re= ad before write is required to determine if an old record has to be deleted=
3. At least CL_QUORUM is required to make sure that read before = write is always consistent
4. Old record has to be deleted on upd= ate

I guess, using an MV here is more efficient as= there is less roundtrip between C* and the app to do all that and the MV d= oes not require strong consistency as MV updates are always local and are e= ventual consistent when the base table is. So there is also no need for dis= tributed locks.

I ask all this as we now use CS 3.= x and have been advised that 3.x is still not considered really production = ready.

I guess in a perfect world, this wouldn'= ;t even require an MV if SASI indexes could be created over more than 1 col= umn. E.g. in MySQL this case is nothing else than a BTree. AFAIK SASI indic= es are also BTrees, filtering by Partition Key (which should to be done any= way) and sorting by a field would perfectly do the trick. But from the docs= , this is not possible right now.

Does an= yone see a better solution or are all my assumptions correct?
--
Benjamin Roth
Prokurist
Jaumo GmbH =C2=B7 www.jaumo.com
Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6p= pingen =C2=B7 Germany
Phone +49 7161 304880-6 =C2=B7 Fax +49 7= 161 304880-1
AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens= Kammerer


<= /div>



--
Benjamin Roth
Prokurist

Jaumo GmbH =C2=B7 www.jaumo.comWehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany
Phone +49 7161 304880-6 =C2=B7 Fax +49 7161 304880-1
AG = Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer
<= /div>





--
Benjamin RothProkurist

Jaumo GmbH =C2=B7 www.jaumo.com
Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppi= ngen =C2=B7 Germany
Phone +49 7161 304880-6 =C2=B7 Fax +49 7161 304880-1=
AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer
<= /div>
--001a113dffc02e417e053e0cb743--