cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Plush (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-9110) Bounded/RingBuffer CQL Collections
Date Thu, 02 Apr 2015 23:19:53 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jim Plush updated CASSANDRA-9110:
---------------------------------
    Description: 
Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections. 

For example: 
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.

Currently that means having to do application level READ/WRITE operations and we like to keep
some of our high scale apps to write only where possible. 

While probably expensive for exactly N items an approximation should be good enough for most
applications. Where N in our example could be 100 or 102, or even make that tunable on the
type or table. 

For the RingBuffer example, consider I only want to store the last N login attempts for a
user. Once N+1 comes in it issues a delete for the oldest one in the collection. 

A potential implementation idea, given the rowkey would live on a single node would be to
have an LRU based counter cache (tunable in the yaml settings in MB) that keeps a current
count of how many items are already in the collection for that rowkey. If > than bound,
toss. It could also be a compaction type thing where it stores all the data then at compaction
time it filters out the data that's out of bounds. 


something akin to:
CREATE TABLE users (
  user_id text PRIMARY KEY,
  first_name text,
  first_logins set<text, 100, oldest>
  last_logins set<text, 100, newest>
);



  was:
Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections. 

For example: 
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.

Currently that means having to do application level READ/WRITE operations and we like to keep
some of our high scale apps to write only where possible. 

While probably expensive for exactly N items an approximation should be good enough for most
applications. Where N in our example could be 100 or 102, or even make that tunable on the
type or table. 

For the RingBuffer example, consider I only want to store the last N login attempts for a
user. Once N+1 comes in it issues a delete for the oldest one in the collection. 

A potential implementation idea, given the rowkey would live on a single node would be to
have an LRU based counter cache (tunable in the yaml settings in MB) that keeps a current
count of how many items are already in the collection for that rowkey. If > than bound,
toss.


something akin to:
CREATE TABLE users (
  user_id text PRIMARY KEY,
  first_name text,
  first_logins set<text, 100, oldest>
  last_logins set<text, 100, newest>
);




> Bounded/RingBuffer CQL Collections
> ----------------------------------
>
>                 Key: CASSANDRA-9110
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9110
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jim Plush
>            Priority: Minor
>
> Feature Request:
> I've had frequent use cases for bounded and RingBuffer based collections. 
> For example: 
> I want to store the first 100 times I've see this thing.
> I want to store the last 100 times I've seen this thing.
> Currently that means having to do application level READ/WRITE operations and we like
to keep some of our high scale apps to write only where possible. 
> While probably expensive for exactly N items an approximation should be good enough for
most applications. Where N in our example could be 100 or 102, or even make that tunable on
the type or table. 
> For the RingBuffer example, consider I only want to store the last N login attempts for
a user. Once N+1 comes in it issues a delete for the oldest one in the collection. 
> A potential implementation idea, given the rowkey would live on a single node would be
to have an LRU based counter cache (tunable in the yaml settings in MB) that keeps a current
count of how many items are already in the collection for that rowkey. If > than bound,
toss. It could also be a compaction type thing where it stores all the data then at compaction
time it filters out the data that's out of bounds. 
> something akin to:
> CREATE TABLE users (
>   user_id text PRIMARY KEY,
>   first_name text,
>   first_logins set<text, 100, oldest>
>   last_logins set<text, 100, newest>
> );



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message