cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8826) Distributed aggregates
Date Wed, 18 Mar 2015 15:16:38 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367281#comment-14367281
] 

Sylvain Lebresne commented on CASSANDRA-8826:
---------------------------------------------

Maybe I can be a bit more precise cause I'm not ever sure we fundamentally disagree. If you're
talking about optimizing aggregates over a single partition, even a reasonably large one,
then I'm fine with that in principle.  But to me, "distributed aggregates" refers to distributing
aggregates over large quantity of data over many nodes _à la_ map-reduce. That's not particularly
real time in my book btw and I maintain that imo that's exactly what Spark/hadoop are about
and there is no point in reinventing that wheel.

Now, if we are talking about single partition aggregates, then the only relation with this
ticket I can see is to push the aggregate on replicas to save cross-node traffics. We know
it's not that that easy for CL > CL.ONE, and for CL.ONE, I think it's fine to assume that
clients do token aware routing, at which point we already do no transfer data over the wire
(and CASSANDRA-7168 will indeed help improve higher CL quite a bit, even without any change
to the current implementation). And I'm just not sure it's worth putting too much effort short
term to optimize the "CL.ONE but no token-aware routing" case.


> Distributed aggregates
> ----------------------
>
>                 Key: CASSANDRA-8826
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8826
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Robert Stupp
>            Priority: Minor
>
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is pulled
by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient. Currently some
related tickets (esp. CASSANDRA-8099) are currently in progress - we should wait for them
to land before talking about implementation.
> Another playgrounds (not covered by this ticket), that might be related is about _distributed
filtering_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message