incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lohfink <clohf...@blackbirdit.com>
Subject Re: fixed size collection possible?
Date Tue, 22 Apr 2014 17:31:30 GMT
It isn’t natively supported but theres some things you can do if need it.

A lot depends on how frequently this list is getting updated. For heavier workloads I would
recommend using a custom CF for this instead of collections.  If extreme inserts you would
want to add additional partitioning to it as well.  As mentioned below Id recommend having
a cleanup MR job to periodically clean it up if the cost of TTLs possibly leading to 0 entries
is too expensive.  Putting it in its own CF helps in that it removes the elements of the list
from polluting your users partition.  If there gets to be a lot of tombstones/inserts this
could make reading the user bad (it would look like queue which has horrible performance)
so it will at least section off that badness from the regular user lookups.

CREATE TABLE user_top_places (
  user_id varchar,
  created timeuuid,
  place varchar,
  PRIMARY KEY (user_id, created))
  WITH CLUSTERING ORDER BY (created DESC);

then to add a new one to the front of the “list”

 INSERT INTO user_top_places (user_id, created, place) VALUES ('frodo', now(), 'mordor’);

and you can see the last 10 entries

SELECT * FROM user_top_places WHERE user_id = 'frodo' LIMIT 10;

This will give you the last 10 entries (allows duplicates though).  Older records will still
be around though and disk space could eventually become a problem for you.  If it becomes
bad I would recommend using a periodic job like hadoop to remove excess columns (solely to
save disk space).  Although if can afford the disk it would give better performance if just
let it grow to a point (providing rows don’t get too large, i.e. >64mb).  If this isn’t
very high in writes there might be some more clever things you can do...

If not having duplicates is more important then you can set “place” as your column name:

CREATE TABLE user_top_places (user_id varchar, place varchar, created timestamp, PRIMARY KEY
(user_id, place));
INSERT INTO user_top_places (user_id, place, created) VALUES ('frodo', 'mordor', dateof(now()));

but the results won’t be in order of latest inserted so might have to do some client side
filtering to show the latest only using the created field.

---
Chris Lohfink

On Apr 22, 2014, at 1:51 AM, Jimmy Lin <y2klyf+work@gmail.com> wrote:

> hi,
> look at the collection type support in cql3,
> e.g
> http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html
> 
> we can append or remove using "+" and "-" operator
> UPDATE users
>   SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';
> UPDATE users
>   SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';
> 
> is there a way to keep a fixed size of the list(collection) ?
> I was thinking about using TTL to remove older data after certain time but then the list
will become too big if the ttl is too long, and if ttl is too short I running the risk of
having a empty list(if there is no new activity).
> 
> Even if I don't use collection type and have my own table, I still ran into the same
issue.
> 
> Any recommendation to handle this type of situation?
> 
> thanks
> 


Mime
View raw message