cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: implementing a 'sorted set' on top of cassandra
Date Tue, 17 Jan 2017 17:54:11 GMT
On Tue, Jan 17, 2017 at 11:47 AM, Mike Torra <> wrote:

> Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is
> indeed what we use today.
> Caching the resulting 'sorted sets' in redis is exactly what I plan to do.
> There will be tens of thousands of these sorted sets, each generally with
> <10k items (with maybe a few exceptions going a bit over that). The reason
> to periodically calculate the set and store it in cassandra is to avoid
> having the client do that work, when the client only really cares about the
> top 100 or so items at any given time. Being truly "real time" is not
> critical for us, but it is a selling point to be as up to date as possible.
> I'd like to understand the performance issue of frequently updating these
> sets. I understand that every time I 'regenerate' the sorted set, any rows
> that change will create a tombstone - for example, if "item_1" is in first
> place and "item_2" is in second place, then they switch on the next update,
> that would be two tombstones. Do you think this will be a big enough
> problem that it is worth doing the sorting work client side, on demand, and
> just try to eat the performance hit there? My thought was to make a
> tradeoff by using more cassandra disk space (ie pre calculating all sets),
> in exchange for faster reads when requests actually come in that need this
> data.
> From: Benjamin Roth <>
> Reply-To: "" <>
> Date: Saturday, January 14, 2017 at 1:25 PM
> To: "" <>
> Subject: Re: implementing a 'sorted set' on top of cassandra
> Mike mentioned "increment" in his initial post. That let me think of a
> case with increments and fetching a top list by a counter like
> 1. Cassandra is absolutely not made to sort by a counter (or a non-counter
> numeric incrementing value) but it is made to store counters. In this case
> a partition could be seen as a set.
> 2. I thought of CS for persistence and - depending on the app requirements
> like real-time and set size - still use redis as a read cache
> 2017-01-14 18:45 GMT+01:00 Jonathan Haddad <>:
>> Sorted sets don't have a requirement of incrementing / decrementing.
>> They're commonly used for thing like leaderboards where the values are
>> arbitrary.
>> In Redis they are implemented with 2 data structures for efficient
>> lookups of either key or value. No getting around that as far as I know.
>> In Cassandra they would require using the score as a clustering column in
>> order to select top N scores (and paginate). That means a tombstone
>> whenever the value for a key in the set changes. In sets with high rates of
>> change that means a lot of tombstones and thus terrible performance.
>> On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan <> wrote:
>>> Sorting on an "incremented" numeric value has always been a nightmare to
>>> be done properly in C*
>>> Either use Counter type but then no sorting is possible since counter
>>> cannot be used as type for clustering column (which allows sort)
>>> Or use simple numeric type on clustering column but then to increment
>>> the value *concurrently* and *safely* it's prohibitive (SELECT to fetch
>>> current value + UPDATE ... IF value = <old_value>) + retry
>>> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth <>
>>> wrote:
>>> If your proposed solution is crazy depends on your needs :)
>>> It sounds like you can live with not-realtime data. So it is ok to cache
>>> it. Why preproduce the results if you only need 5% of them? Why not use
>>> redis as a cache with expiring sorted sets that are filled on demand from
>>> cassandra partitions with counters?
>>> So redis has much less to do and can scale much better. And you are not
>>> limited on keeping all data in ram as cache data is volatile and can be
>>> evicted on demand.
>>> If this is effective also depends on the size of your sets. CS wont be
>>> able to sort them by score for you, so you will have to load the complete
>>> set to redis for caching and / or do sorting in your app on demand. This
>>> certainly won't work out well with sets with millions of entries.
>>> 2017-01-13 23:14 GMT+01:00 Mike Torra <>:
>>> We currently use redis to store sorted sets that we increment many, many
>>> times more than we read. For example, only about 5% of these sets are ever
>>> read. We are getting to the point where redis is becoming difficult to
>>> scale (currently at >20 nodes).
>>> We've started using cassandra for other things, and now we are
>>> experimenting to see if having a similar 'sorted set' data structure is
>>> feasible in cassandra. My approach so far is:
>>>    1. Use a counter CF to store the values I want to sort by
>>>    2. Periodically read in all key/values in the counter CF and sort in
>>>    the client application (~every five minutes or so)
>>>    3. Write back to a different CF with the ordered keys I care about
>>> Does this seem crazy? Is there a simpler way to do this in cassandra?
>>> --
>>> Benjamin Roth
>>> Prokurist
>>> Jaumo GmbH ·
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
> --
> Benjamin Roth
> Prokurist
> Jaumo GmbH ·
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

If I had to tackle this at scale I would likely use a hybrid streaming +
datastore. For example if I were using Kafka I could partition the updates
such that the for a given key the events always arrive at the same

>From there I have some options. The one most attractive to me is to manage
offset consumption directly. For example every 5 minutes execute something
like this:

private <Set> set;

  writeSetToCassandra(Set s)

On each message event you could do this:

onTuple(Tuple t){
   set.add (extractDataFrom(t))

"Jedis" in this example could be replaced with something

This would theoretically deal with tombstone issues as you simply write all
the data once every five minutes to Cassandra. The only trick here is how
to roll back if you have failure inside the 5 minute persistence window.

Anyway not the most rigorous description, your mileage may vary.

View raw message