cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8826) Distributed aggregates
Date Wed, 18 Mar 2015 15:16:38 GMT


Sylvain Lebresne commented on CASSANDRA-8826:

Maybe I can be a bit more precise cause I'm not ever sure we fundamentally disagree. If you're
talking about optimizing aggregates over a single partition, even a reasonably large one,
then I'm fine with that in principle.  But to me, "distributed aggregates" refers to distributing
aggregates over large quantity of data over many nodes _à la_ map-reduce. That's not particularly
real time in my book btw and I maintain that imo that's exactly what Spark/hadoop are about
and there is no point in reinventing that wheel.

Now, if we are talking about single partition aggregates, then the only relation with this
ticket I can see is to push the aggregate on replicas to save cross-node traffics. We know
it's not that that easy for CL > CL.ONE, and for CL.ONE, I think it's fine to assume that
clients do token aware routing, at which point we already do no transfer data over the wire
(and CASSANDRA-7168 will indeed help improve higher CL quite a bit, even without any change
to the current implementation). And I'm just not sure it's worth putting too much effort short
term to optimize the "CL.ONE but no token-aware routing" case.

> Distributed aggregates
> ----------------------
>                 Key: CASSANDRA-8826
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Robert Stupp
>            Priority: Minor
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is pulled
by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient. Currently some
related tickets (esp. CASSANDRA-8099) are currently in progress - we should wait for them
to land before talking about implementation.
> Another playgrounds (not covered by this ticket), that might be related is about _distributed

This message was sent by Atlassian JIRA

View raw message