cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-12417) Built-in AVG aggregate is much less useful than it should be
Date Tue, 04 Oct 2016 13:21:21 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545334#comment-15545334
] 

Benjamin Lerer edited comment on CASSANDRA-12417 at 10/4/16 1:20 PM:
---------------------------------------------------------------------

Even if the new behavior is probably better than the previous one it is still a change of
behavior which can surprise some users. By consequence, I think we should keep the ticket
as improvement and provided a patch only for {{3.x}}. 
It would be good if you could add an entry to {{NEWS.txt}} and update the documentation for
the functions. It would be nice to explain there how sums and averages are computed and the
way they behave in case of overflow.



was (Author: blerer):
Even if the new behavior is probably better than the previous one it is still a change of
behavior which can surprise some users. By consequence, I think we should keep the ticket
as improvement and provided a patch only for {{3.x}}. 
It would be good if you could add an entry to {{NEWS.txt}} and update the documentation for
the functions. It would be nice to explain there how sum and averages are computed and the
way they behave in case of overflow.


> Built-in AVG aggregate is much less useful than it should be
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-12417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12417
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>            Reporter: Branimir Lambov
>            Assignee: Alex Petrov
>
> For fixed-size integer types overflow is all but guaranteed to happen, yielding incorrect
result. While for sum it is somewhat acceptable as the result cannot fit the type, this is
not the case for average.
> As the result of average is always within the scope of the source type, failing to produce
it only signifies a bad implementation. Yes, one can solve this by type-casting, but do we
really want to always have to be telling people that the correct spelling of the average function
is {{cast(avg(cast(value as bigint))) as int)}}, especially if this is so trivial to fix?
> Additionally, the straightforward addition we use for floating point versions is not
a good choice numerically for larger numbers of values. We should switch to a more stable
version, e.g. iterative mean using {{avg = avg + (value - avg) / count}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message