cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmet AKYOL (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3999) Column families for "most recent data", (a.k.a. size-safe wide rows)
Date Mon, 05 Mar 2012 15:35:57 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222403#comment-13222403
] 

Ahmet AKYOL commented on CASSANDRA-3999:
----------------------------------------

OK, it's exactly Cassandra-3929. I asked this as a question on [stackoverflow|http://stackoverflow.com/questions/9546458/column-families-for-most-recent-data-in-cassandra]
but there wasn't an answer, then I opened this issue. Thanks.
                
> Column families for "most recent data", (a.k.a. size-safe wide rows)
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-3999
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3999
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Ahmet AKYOL
>
> "Wide row design" is very handy (for time series data) and on the other hand we have
to keep each row size around an acceptable amount. Then, we need buckets; right? Monthly,
daily or even hourly buckets... The problem with bucket approach is the distribution of data
in rows (as always). 
> So, why not to tell cassandra we want a column family like LRU cache but on disk. If
we start design from queries we usually end up with "most recent data" queries. This "size
safe wide rows" approach can be very useful in many use cases.
> Here are some example hypothetical column family storage parameters :
> max_column_number_hint : 1000 // meaning: try to keep around 1000 columns. Since it's
a hint, we(users) are OK with tombstones or 800 - 1200 range
> or
> max_row_size_hint : 1MB
> I don't know "Cassandra Internals" but C* has already background jobs( for compaction,deletion
and ttl) and columns already have timestamps. So both from user point of view and C*, it makes
sense.
> P.S: Sorry for my poor English and it's my very first "issue" :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message