cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Branson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3929) Support row size limits
Date Tue, 05 Mar 2013 20:04:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593843#comment-13593843
] 

Rick Branson commented on CASSANDRA-3929:
-----------------------------------------

[~liqusha]: What I mean is that in order to DELETE only the tail, Cassandra will have to read
the entire row. For instance, your minimum retention requirement is ~500 columns, in order
to find any columns after the 500th, the following operations must be performed:

 * All of the columns are read from the SSTable files that contain columns for that row
 * These row fragments are "merged" (re-sorting by Comparator, tombstone removal, etc)
 * Tombstones must be inserted for each column "after" the 500th.
 * As time goes on and tombstones build up (before GC grace), this operation gets more and
more expensive and compaction perf also suffers.

What I mean by "free" is not actually the need to perform the DELETE operation, but that it
doesn't add extra cost burden to support this feature.

As far as use case, it varies quite a bit. There are many use cases I can imagine for persistent
storage with a quota for each user that auto-evicts old data over time for a low cost. Even
for "big data" scenarios, the cost of computing still goes up as the data size grows. For
instance, a database used to store objects a user interacted with for performing collaborative
filtering only needs a sample. In real world use cases, these types of algorithms really need
a relatively bounded set of data, and user taste might change over time, so only taking into
consideration the most recent 90 objects makes sense. TTL'ing this data also doesn't make
sense, because there are a wide range of frequencies at which users might generate this data.

[~slebresne]: I spent a few hours digging thru the compaction source and it's going to be
messy to do this, probably involving a lot of copy+paste, so I'm even more +1 on disaggregating
that massive Runnable method in CompactionTask into something more pluggable / extensible.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 2.0
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929_d.txt, 3929_e.txt, 3929_f.txt, 3929_g_tests.txt,
3929_g.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping
the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message