cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6167) Add end-slice termination predicate
Date Fri, 14 Feb 2014 08:34:19 GMT


Sylvain Lebresne commented on CASSANDRA-6167:

Playing devil's advocate here but why wouldn't you just store the last aggregated value in
a separate table? Granted, that assume you know the last aggregation value which in theory
means 2 reads, but in practice it doesn't sound particularly hard for clients to cache that
last aggregated value (of course, you'd want to refresh that cached value at some frequency
but that can be done in the background easily enough).

Because my main problem with that example is that this sound a lot like a hack. If I store
floats, I want evtval to be a float, not some string that I abuse to store an aggregation
in the middle of other stuffs (because that's fairly error prone for any consumer of the table
that don't care about the pre-computed aggregation). I really don't think we should "promote"
such ways. Note that I understand it's "just an example", but it doesn't feels to me like
we should add such a thing without a bunch of non-hacky examples of that being useful.

Also, there is CASSANDRA-4914. Once we have that, you'd want to use it for aggregation. Even
if you still want to do the incremental aggregation like in your example, you'll still really
want to use CASSANDRA-4914 to aggregate the values 'since last aggregation'. And I don't really
see how the idea of this could cleanly cohabit with CASSANDRA-4914 (while it's trivial if
you just store/cache the aggregation separately).  

> Add end-slice termination predicate
> -----------------------------------
>                 Key: CASSANDRA-6167
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API, Core
>            Reporter: Tupshin Harper
>            Priority: Minor
>              Labels: ponies
> When doing performing storage-engine slices, it would sometimes be beneficial to have
the slice terminate for other reasons other than number of columns or min/max cell name.
> Since we are able to look at the contents of each cell as we read it, this is potentially
doable with very little overhead. 
> Probably more challenging than the storage-engine implementation itself, is to come up
with appropriate CQL syntax (Thrift, should we decide to support it, would be trivial).
> Two possibilities ar
> 1) special where function:
> SELECT pk,event from cf WHERE pk IN (1,5,10,11) AND partition_predicate({predicate})
> or a bigger language change, but i think one I prefer. more like:
> 2) SELECT pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event {predicate}
> Neither feels perfect, but I do like the fact that the second one at least clearly states
what it is intended to do.
> By using "UNTIL PARTITION", we could re-use the UNTIL keyword to handle other kinds of
early-termination of selects that the coordinator might be able to do, such as stop retrieving
additional rows from shards after a particular criterion was met.

This message was sent by Atlassian JIRA

View raw message