Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3493C198F5 for ; Tue, 22 Mar 2016 19:03:26 +0000 (UTC) Received: (qmail 72286 invoked by uid 500); 22 Mar 2016 19:03:26 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 72191 invoked by uid 500); 22 Mar 2016 19:03:25 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 72075 invoked by uid 99); 22 Mar 2016 19:03:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2016 19:03:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 919FB2C1F60 for ; Tue, 22 Mar 2016 19:03:25 +0000 (UTC) Date: Tue, 22 Mar 2016 19:03:25 +0000 (UTC) From: "Todd Lisonbee (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (FLINK-3613) Add standard deviation, mean, variance to list of Aggregations MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FLINK-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207054#comment-15207054 ] Todd Lisonbee edited comment on FLINK-3613 at 3/22/16 7:02 PM: --------------------------------------------------------------- Attached is a design for improvements to DataSet.aggregate() needed to implement additional aggregations like Standard Deviation. To maintain public API's it seems like the best path would be to have AggregateOperator implement CustomUnaryOperation but that seems weird because no other Operator is done that way. But other options I see don't seem consistent with other Operators either. I really could use some feedback on this. Thanks. Also, should I be posting this to the Dev mailing list? was (Author: tlisonbee): Attached is a design for improvements to DataSet.aggregate() needed to implement additional aggregations like Standard Deviation. To maintain public API's it seems like the best path would be to have AggregateOperator implement CustomUnaryOperation but that seems weird because no other Operator is done that way. But other options I see don't seem consistent with other Operators either. I really could use some feedback on this. Thanks. > Add standard deviation, mean, variance to list of Aggregations > -------------------------------------------------------------- > > Key: FLINK-3613 > URL: https://issues.apache.org/jira/browse/FLINK-3613 > Project: Flink > Issue Type: Improvement > Reporter: Todd Lisonbee > Priority: Minor > Attachments: DataSet-Aggregation-Design-March2016-v1.txt > > > Implement standard deviation, mean, variance for org.apache.flink.api.java.aggregation.Aggregations > Ideally implementation should be single pass and numerically stable. > References: > "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et al, International Conference on Data Engineering 2012 > http://dl.acm.org/citation.cfm?id=2310392 > "The Kahan summation algorithm (also known as compensated summation) reduces the numerical errors that occur when adding a sequence of finite precision floating point numbers. Numerical errors arise due to truncation and rounding. These errors can lead to numerical instability when calculating variance." > https://en.wikipedia.org/wiki/Kahan_summation_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)