cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8826) Distributed aggregates
Date Wed, 18 Feb 2015 16:29:11 GMT


Sylvain Lebresne commented on CASSANDRA-8826:

I'll note that Cassandra has no ambition of tackling analytic queries itself. There is wonderful
framework (Hadoop, Spark) that do that better that we probably can. Existing aggregation are
1) when you want to aggregate over a (small portion) of a single partition (basically for
the case where today you'd just query and aggregate client side; in that case, btw, if you
use CL.ONE and token-aware client, distributing the aggregate would buy you nothing) and 2)
as convenience during development.

I'm not saying there is no way to implement distributed aggregates, but we know it's not trivial
either (due to consistency issues in particular) and hence it's imo not worth the complexity
of re-inventing a poor-man Spark when Spark (or other) exists and is actively developed. Overall,
I feel this is out of scope for Cassandra.

> Distributed aggregates
> ----------------------
>                 Key: CASSANDRA-8826
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Robert Stupp
>            Priority: Minor
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is pulled
by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient. Currently some
related tickets (esp. CASSANDRA-8099) are currently in progress - we should wait for them
to land before talking about implementation.
> Another playgrounds (not covered by this ticket), that might be related is about _distributed

This message was sent by Atlassian JIRA

View raw message