cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@riptano.com>
Subject Re: limiting columns in a row
Date Fri, 14 Jan 2011 15:22:55 GMT
Hi,

> does this seem like a generally useful feature?

I do think this could be a useful feature. If only because I don't think
there
is any satisfactory/efficient way to do this client side.

> if so, would it be hard to implement (maybe it could be done at compaction
> time like the TTL feature)?

Out of the top of my hat (aka, I haven't really think that through but I'll
still give my opinion), I see the following difficulties:
  1) You can only do this limiting during major compaction or the same cases
     as CASSANDRA-1074 for minor, since you need to make sure the x columns
you
     are keeping are not deleted ones. Or you'll want to disable deletes
     altogether on the cf with this 'limit' option (I feel like this last
     option would really simplify things).
  2) Even if the removal of the column exceeding the limit is eventual (and
it
     will), you'll want query to only ever return column inside the limit
     (otherwise the feature would be too unpredictable). But I think this
will
     be quite challenging. That is, slice query from the start of the row
are
     easy. Everything else is harder (at least if you want to make it
efficient).

That was my 2 cents. Anyway, you can always open a JIRA ticket.

--
Sylvain


On Fri, Jan 14, 2011 at 7:38 AM, mike dooley <dooley@apple.com> wrote:

> hi,
>
> the time-to-live feature in 0.7 is very nice and it made me want to ask
> about
> a somewhat similar feature.
>
> i have a stream of data consisting of entities and associated samples.  so
> i create
> a row for each entity and the columns in each row contain the samples for
> that entity.
> when i get around to processing  an entity i only care about the most
> recent N samples.
> so i read the most recent N columns and delete all the rest.
>
> what i would like would be a column family property that allows me to
> specify a maximum number of columns per row.  then i could just keep
> writing
> and not have to do the deletes.
>
> in my case it would be fine if the limit is only 'eventually' applied (so
> that
> sometimes there might be extra columns).
>
> does this seem like a generally useful feature?  if so, would it be hard to
> implement (maybe it could be done at compaction time like the TTL feature)?
>
> thanks,
> -mike

Mime
View raw message