cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Brown (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8826) Distributed aggregates
Date Wed, 26 Oct 2016 23:07:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609985#comment-15609985
] 

Luke Brown commented on CASSANDRA-8826:
---------------------------------------

For CL>1, wouldn't read repair require shipping around the underlying data, which this
feature is intended to avoid doing? Would it still be worthwhile? If it's important to the
client that only aggregated results are sent between nodes, I'm thinking that would rule out
reconciliation for most aggregation functions.

Because the queries would unpredictably produce network traffic comparable to the current
method of aggregating in the coordinator, right? When that happens, the trade-off might even
be considered a net performance loss given that the queried nodes would all be running the
aggregation functions too, rather than just the coordinator.

If that's true, the most the coordinator should do for CL>1 distributed aggregates would
be to compare replica results, and any differences should just fail the query without making
any attempt to reconcile the underlying data (no foreground or background repairs). For some
applications, that fail-fast alternative could be an improvement over CL.ONE & token-aware
client, since the coordinator would still choose the best >1 nodes to try--given the coordinator
is a better place to compare the multiple node responses than the client/driver.

But given that this special case would need its own additional implementation for aggregates,
would it still be considered a worthwhile feature?

> Distributed aggregates
> ----------------------
>
>                 Key: CASSANDRA-8826
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8826
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Robert Stupp
>            Priority: Minor
>
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is pulled
by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient. Currently some
related tickets (esp. CASSANDRA-8099) are currently in progress - we should wait for them
to land before talking about implementation.
> Another playgrounds (not covered by this ticket), that might be related is about _distributed
filtering_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message