cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Roth <benjamin.r...@jaumo.com>
Subject Re: Efficient model for a sorting
Date Tue, 04 Oct 2016 16:40:16 GMT
I have the impression, that not the tick-tock is the real problem but MVs
are not really battle-tested yet.
Depending on the model, they put much more complexity on a cluster and it's
behaviour under heavy load. Especially if you are going to create an MV
with a different partition key than the base table this might be a shot in
the head.
At least I was able to bring my cluster down many times just by throwing a
few queries too much at it or by running some big repairs with reaper.
Only since some days, things seem to go smoothly after having struggled
about 2 months with very different kind of issues.

We'll see ... most probably I will stick with the latest version. After all
it seems to work ok, I gained a lot of experience in running and
troubleshooting and to deal with bugs and maybe I am so able to contribute
a bit to further development.

2016-10-04 18:26 GMT+02:00 Vladimir Yudovin <vladyu@winguzone.com>:

> >Would you consider 3.0.x to be more stable than 3.x?
> I guess yes, but there are some discussion on this list:
>
> (C)* stable version after 3.5
> <https://lists.apache.org/thread.html/4e4e67175efd1207965eb528e098f35dd268fba0f66632924d8bd0a2@%3Cuser.cassandra.apache.org%3E>
> Upgrade from 3.0.6 to 3.7.
> <https://lists.apache.org/thread.html/14e383fabe1ea1750a3e3eec78a1490ae27484ed2d3424d9c3aeb9e2@%3Cuser.cassandra.apache.org%3E>
>
> It seems to be eternal topic till tick-tock approach stabilizes.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone Inc <https://winguzone.com?from=list> - Hosted Cloud Cassandra
> on Azure and SoftLayer.Launch your cluster in minutes.*
>
>
> ---- On Tue, 04 Oct 2016 12:19:13 -0400 *Benjamin
> Roth<benjamin.roth@jaumo.com <benjamin.roth@jaumo.com>>* wrote ----
>
> I use the self-compiled master (3.10, ticktock). I had to fix a severe bug
> on my own and decided to go with the latest code.
> Would you consider 3.0.x to be more stable than 3.x?
>
> 2016-10-04 18:14 GMT+02:00 Vladimir Yudovin <vladyu@winguzone.com>:
>
> Hi Benjamin!
>
> >we now use CS 3.x and have been advised that 3.x is still not considered
> really production ready.
>
> Did you consider using of 3.0.9? Actually it's 3.0 with almost an year
> fixes.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone Inc <https://winguzone.com?from=list> - Hosted Cloud Cassandra
> on Azure and SoftLayer.Launch your cluster in minutes.*
>
>
> ---- On Tue, 04 Oct 2016 07:27:54 -0400 *Benjamin Roth
> <benjamin.roth@jaumo.com <benjamin.roth@jaumo.com>>* wrote ----
>
> Hi!
>
> I have a frequently used pattern which seems to be quite costly in CS. The
> pattern is always the same: I have a unique key and a sorting by a
> different field.
>
> To give an example, here a real life example from our model:
> CREATE TABLE visits.visits_in (
>     user_id int,
>     user_id_visitor int,
>     created timestamp,
>     PRIMARY KEY (user_id, user_id_visitor)
> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>
> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
>     SELECT user_id, created, user_id_visitor
>     FROM visits.visits_in
>     WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
> IS NOT NULL
>     PRIMARY KEY (user_id, created, user_id_visitor)
>     WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>
> This simply represents people, that visited my profile sorted by date desc
> but only one entry per visitor.
> Other examples with the same pattern could be a whats-app-like inbox where
> the last message of each sender is shown by date desc. There are lots of
> examples for that pattern.
>
> E.g. in redis I'd just use a sorted set, where the key could be like
> "visits_${user_id}", set key would be user_id_visitor and score
> the created timestamp.
> In MySQL I'd create the table with PK on user_id + user_id_visitor and
> create an index on user_id + created
> In C* i use an MV.
>
> Is this the most efficient approach?
> I also could have done this without an MV but then the situation in our
> app would be far more complex.
> I know that denormalization is a common pattern in C* and I don't hesitate
> to use it but in this case, it is not as simple as it's not an append-only
> case but updates have to be handled correctly.
> If it is the first visit of a user, it's that simple, just 2 inserts in
> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
> visit has to be deleted from the denormalized table before. Otherwise the
> visit would not be unique any more.
> Handling this case without an MV requires a lot more effort, I guess even
> more effort than just using an MV.
> 1. You need kind of app-side locking to deal with race conditions
> 2. Read before write is required to determine if an old record has to be
> deleted
> 3. At least CL_QUORUM is required to make sure that read before write is
> always consistent
> 4. Old record has to be deleted on update
>
> I guess, using an MV here is more efficient as there is less roundtrip
> between C* and the app to do all that and the MV does not require strong
> consistency as MV updates are always local and are eventual consistent when
> the base table is. So there is also no need for distributed locks.
>
> I ask all this as we now use CS 3.x and have been advised that 3.x is
> still not considered really production ready.
>
> I guess in a perfect world, this wouldn't even require an MV if SASI
> indexes could be created over more than 1 column. E.g. in MySQL this case
> is nothing else than a BTree. AFAIK SASI indices are also BTrees, filtering
> by Partition Key (which should to be done anyway) and sorting by a field
> would perfectly do the trick. But from the docs, this is not possible right
> now.
>
> Does anyone see a better solution or are all my assumptions correct?
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Mime
View raw message