incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lior Golan <lio...@taboola.com>
Subject RE: How to keep only exactly column of key
Date Wed, 20 Jul 2011 08:57:53 GMT
Thanks Sylvain

Can you please point us to what interface should be implemented in order to write our own
custom compaction. And how is it supposed to be configured?

-----Original Message-----
From: Sylvain Lebresne [mailto:sylvain@datastax.com] 
Sent: Tuesday, July 19, 2011 11:40 AM
To: user@cassandra.apache.org
Subject: Re: How to keep only exactly column of key

On Tue, Jul 19, 2011 at 10:15 AM, Lior Golan <lior.g@taboola.com> wrote:
> Can't this capping be done (approximately) during compaction. 
> Something
> like:
>
> 1.       Ability to define for a column family that it's a "capped 
> collection" with at most N columns per row
>
> 2.       During write - just add the column
>
> 3.       During reads - get a slice with the most recent / top N 
> column (in terms of column order)
>
> 4.       During compaction - if the number of columns in the row is 
> more than N, trim it to the top N columns (by replacing the rest of 
> the columns with a tombstone in the compacted row)
>
> Since I guess the purpose of this is for automated cleanup, and not 
> for enforcing exactly N columns, I think this would be sufficient

The problem with that is that we cannot enforce this on the query side.
Or more precisely, returning the top N first columns is fine, but what with query like "M
columns starting from 'b'" ? Or columns by name ?
We cannot do those efficiently while enforcing that we won't return any columns after the
N first ones. The only solution would be to always query the first N ones and then filter
afterwards, but that's not efficient.

What I mean here is that it is hard to add that as a column family option given the limitation
it would entail. That being said, 1.0 will add pluggable compaction (it's already in trunk)
and it will be very easy to have a compaction that just drop columns after the N first. It
would then be on the client side to deal with the possibility to get more that the first N
ones, but as you said, if it is for automated cleanup, that will be enough.

--
Sylvain

> From: Tupshin Harper [mailto:tupshin@tupshin.com]
> Sent: Tuesday, July 19, 2011 10:04 AM
> To: user@cassandra.apache.org
> Subject: Re: How to keep only exactly column of key
>
>
>
> Speaking from practical experience, it is possible to simulate this 
> feature by retrieving a slice of your row that only contains the most 
> recent 100 items. You can then prevent the rows from growing out of 
> control by checking the size of the row and pruning it back to 100 
> every N writes, where N is small enough to prevent excessive growth, 
> but large enough to prevent excessive overhead. A value of 50 or so 
> for N worked reasonably well for me for. If you do go down this path, 
> though, keep in mind that rapid writes and deletes to a single column 
> are basically a Cassandra anti-pattern due to performance problems with huge numbers
of tombstones.
>
>
>
> I would love to see a feature added similar to MongoDB's "capped 
> collections", but I don't believe there is any easy way to retrofit it 
> into Cassandra's sstable approach.
> http://www.mongodb.org/display/DOCS/Capped+Collections
>
>
>
> -Tupshin
>
> On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight <beuknight@gmail.com>
> wrote:
>
> Dear all,
>
>
>
> I want to keep only 100 column of a key: when I add a column for a 
> key, if the number column of key is 100, another column (by order) will be deleted.
>
>
>
> Does Cassandra have setting for that?
>
> --
> Best regards,
> JKnight
>
>



Mime
View raw message