cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8826) Distributed aggregates
Date Wed, 18 Mar 2015 15:50:41 GMT


Benedict commented on CASSANDRA-8826:

I don't think we _fundamentally_ disagree. I guess I should outline what I am thinking of.

Initially, for single partition queries, but expanding to multiple partition queries, I would
like our abstraction for aggregations to support partial results (continuations, effectively)
that can be shipped around along with digests, and composed on the coordinator (or repaired).
A different result would be returned for the repaired and the unrepaired portions from each
owner, and combined on the coordinator. This permits us to answer these queries quickly in
the common case where there is agreement, permits quick repair, and allows us to expand support
to aggregations over multiple partitions without really tremendous difficult, by resolving
each partition independently into its own partial computation, that are then combined with
each of the other partial computations.

I don't pretend this is _simple_, but nor do I think it is prohibitively complex nor out of
scope. It seems a good solution to all of the above problems, and permits us to easily push
the construction of each _partial_ computation much lower into the stack when we have the
time, so that this (the main body of work) can be done much more efficiently, and with network
traffic proportional to the size of the result, not the domain.

The same abstraction can be used to implement sampled or exact, single or multi partition
aggregations. Most crucially supporting them with repaired data, which we cannot do with any
of our map/reduce connectors, and supporting them in "realtime"

> Distributed aggregates
> ----------------------
>                 Key: CASSANDRA-8826
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Robert Stupp
>            Priority: Minor
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is pulled
by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient. Currently some
related tickets (esp. CASSANDRA-8099) are currently in progress - we should wait for them
to land before talking about implementation.
> Another playgrounds (not covered by this ticket), that might be related is about _distributed

This message was sent by Atlassian JIRA

View raw message