incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: How to keep only exactly column of key
Date Tue, 19 Jul 2011 08:39:58 GMT
On Tue, Jul 19, 2011 at 10:15 AM, Lior Golan <lior.g@taboola.com> wrote:
> Can't this capping be done (approximately) during compaction. Something
> like:
>
> 1.       Ability to define for a column family that it's a "capped
> collection" with at most N columns per row
>
> 2.       During write – just add the column
>
> 3.       During reads – get a slice with the most recent / top N column (in
> terms of column order)
>
> 4.       During compaction – if the number of columns in the row is more
> than N, trim it to the top N columns (by replacing the rest of the columns
> with a tombstone in the compacted row)
>
> Since I guess the purpose of this is for automated cleanup, and not for
> enforcing exactly N columns, I think this would be sufficient

The problem with that is that we cannot enforce this on the query side.
Or more precisely, returning the top N first columns is fine, but what with
query like "M columns starting from 'b'" ? Or columns by name ?
We cannot do those efficiently while enforcing that we won't return any
columns after the N first ones. The only solution would be to always query
the first N ones and then filter afterwards, but that's not efficient.

What I mean here is that it is hard to add that as a column family option
given the limitation it would entail. That being said, 1.0 will add pluggable
compaction (it's already in trunk) and it will be very easy to have a compaction
that just drop columns after the N first. It would then be on the client side
to deal with the possibility to get more that the first N ones, but as you said,
if it is for automated cleanup, that will be enough.

--
Sylvain

> From: Tupshin Harper [mailto:tupshin@tupshin.com]
> Sent: Tuesday, July 19, 2011 10:04 AM
> To: user@cassandra.apache.org
> Subject: Re: How to keep only exactly column of key
>
>
>
> Speaking from practical experience, it is possible to simulate this feature
> by retrieving a slice of your row that only contains the most recent 100
> items. You can then prevent the rows from growing out of control by checking
> the size of the row and pruning it back to 100 every N writes, where N is
> small enough to prevent excessive growth, but large enough to prevent
> excessive overhead. A value of 50 or so for N worked reasonably well for me
> for. If you do go down this path, though, keep in mind that rapid writes and
> deletes to a single column are basically a Cassandra anti-pattern due to
> performance problems with huge numbers of tombstones.
>
>
>
> I would love to see a feature added similar to MongoDB's "capped
> collections", but I don't believe there is any easy way to retrofit it into
> Cassandra's sstable approach.
> http://www.mongodb.org/display/DOCS/Capped+Collections
>
>
>
> -Tupshin
>
> On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight <beuknight@gmail.com>
> wrote:
>
> Dear all,
>
>
>
> I want to keep only 100 column of a key: when I add a column for a key, if
> the number column of key is 100, another column (by order) will be deleted.
>
>
>
> Does Cassandra have setting for that?
>
> --
> Best regards,
> JKnight
>
>

Mime
View raw message