cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Plush (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-9110) Bounded/RingBuffer CQL Collections
Date Thu, 02 Apr 2015 23:20:58 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jim Plush updated CASSANDRA-9110:
---------------------------------
    Description: 
Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections. 

For example: 
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.

Currently that means having to do application level READ/WRITE operations and we like to keep
some of our high scale apps to write only where possible. 

While probably expensive for exactly N items an approximation should be good enough for most
applications. Where N in our example could be 100 or 102, or even make that tunable on the
type or table. 

For the RingBuffer example, consider I only want to store the last N login attempts for a
user. Once N+1 comes in it issues a delete for the oldest one in the collection. 

A potential implementation idea, given the rowkey would live on a single node would be to
have an LRU based counter cache (tunable in the yaml settings in MB) that keeps a current
count of how many items are already in the collection for that rowkey. If > than bound,
toss. It could also be a compaction type thing where it stores all the data then at compaction
time it filters out the data that's out of bounds as long as the CQL returns the right bounds.


something akin to:
CREATE TABLE users (
  user_id text PRIMARY KEY,
  first_name text,
  first_logins set<text, 100, oldest>
  last_logins set<text, 100, newest>
);



  was:
Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections. 

For example: 
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.

Currently that means having to do application level READ/WRITE operations and we like to keep
some of our high scale apps to write only where possible. 

While probably expensive for exactly N items an approximation should be good enough for most
applications. Where N in our example could be 100 or 102, or even make that tunable on the
type or table. 

For the RingBuffer example, consider I only want to store the last N login attempts for a
user. Once N+1 comes in it issues a delete for the oldest one in the collection. 

A potential implementation idea, given the rowkey would live on a single node would be to
have an LRU based counter cache (tunable in the yaml settings in MB) that keeps a current
count of how many items are already in the collection for that rowkey. If > than bound,
toss. It could also be a compaction type thing where it stores all the data then at compaction
time it filters out the data that's out of bounds. 


something akin to:
CREATE TABLE users (
  user_id text PRIMARY KEY,
  first_name text,
  first_logins set<text, 100, oldest>
  last_logins set<text, 100, newest>
);




> Bounded/RingBuffer CQL Collections
> ----------------------------------
>
>                 Key: CASSANDRA-9110
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9110
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jim Plush
>            Priority: Minor
>
> Feature Request:
> I've had frequent use cases for bounded and RingBuffer based collections. 
> For example: 
> I want to store the first 100 times I've see this thing.
> I want to store the last 100 times I've seen this thing.
> Currently that means having to do application level READ/WRITE operations and we like
to keep some of our high scale apps to write only where possible. 
> While probably expensive for exactly N items an approximation should be good enough for
most applications. Where N in our example could be 100 or 102, or even make that tunable on
the type or table. 
> For the RingBuffer example, consider I only want to store the last N login attempts for
a user. Once N+1 comes in it issues a delete for the oldest one in the collection. 
> A potential implementation idea, given the rowkey would live on a single node would be
to have an LRU based counter cache (tunable in the yaml settings in MB) that keeps a current
count of how many items are already in the collection for that rowkey. If > than bound,
toss. It could also be a compaction type thing where it stores all the data then at compaction
time it filters out the data that's out of bounds as long as the CQL returns the right bounds.
> something akin to:
> CREATE TABLE users (
>   user_id text PRIMARY KEY,
>   first_name text,
>   first_logins set<text, 100, oldest>
>   last_logins set<text, 100, newest>
> );



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message